|
cursor_type | cursor () const noexcept |
| Returns a seqan3::fm_index_cursor on the index that can be used for searching. Cursor is pointing to the root node of the implicit suffix tree. . More...
|
|
bool | empty () const noexcept |
| Checks whether the index is empty. More...
|
|
bool | operator!= (fm_index const &rhs) const noexcept |
| Compares two indices. More...
|
|
bool | operator== (fm_index const &rhs) const noexcept |
| Compares two indices. More...
|
|
template<cereal_archive archive_t> |
void | serialize (archive_t &archive) |
| Serialisation support function. More...
|
|
size_type | size () const noexcept |
| Returns the length of the indexed text including sentinel characters. More...
|
|
|
| fm_index ()=default |
| Defaulted.
|
|
| fm_index (fm_index const &rhs) |
| When copy constructing, also update internal data structures.
|
|
| fm_index (fm_index &&rhs) |
| When move constructing, also update internal data structures.
|
|
fm_index & | operator= (fm_index rhs) |
| When copy/move assigning, also update internal data structures.
|
|
| ~fm_index ()=default |
| Defaulted.
|
|
template<std::ranges::bidirectional_range text_t> |
| fm_index (text_t &&text) |
| Constructor that immediately constructs the index given a range. The range cannot be empty. More...
|
|
|
using | sdsl_index_type = sdsl_index_type_ |
| The type of the underlying SDSL index.
|
|
using | sdsl_char_type = typename sdsl_index_type::alphabet_type::char_type |
| The type of the reduced alphabet type. (The reduced alphabet might be smaller than the original alphabet in case not all possible characters occur in the indexed text.)
|
|
using | sdsl_sigma_type = typename sdsl_index_type::alphabet_type::sigma_type |
| The type of the alphabet size of the underlying SDSL index.
|
|
using | alphabet_type = alphabet_t |
| The type of the underlying character of the indexed text.
|
|
using | size_type = typename sdsl_index_type::size_type |
| Type for representing positions in the indexed text.
|
|
using | cursor_type = fm_index_cursor< fm_index > |
| The type of the (unidirectional) cursor.
|
|
template<
semialphabet alphabet_t,
text_layout text_layout_mode_, detail::sdsl_index sdsl_index_type_ = default_sdsl_index_type>
class seqan3::fm_index< alphabet_t, text_layout_mode_, sdsl_index_type_ >
The SeqAn FM Index.
- Template Parameters
-
alphabet_t | The alphabet type; must model seqan3::semialphabet. |
text_layout_mode_ | Indicates whether this index works on a text collection or a single text. See seqan3::text_layout. |
sdsl_index_type_ | The type of the underlying SDSL index, must model seqan3::sdsl_index. |
The seqan3::fm_index is a fast and space-efficient string index to search strings and collections of strings.
General information
Here is a short example on how to build an index and search a pattern using an cursor. Please note that there is a very powerful search module with a high-level interface seqan3::search that encapsulates the use of cursors.
int main()
{
auto cur =
index.cursor();
cur.extend_right("AAGG"_dna4);
for (auto && pos : cur.locate())
return 0;
}
The SeqAn FM Index.
Definition: fm_index.hpp:192
sdsl_index_type index
Underlying index from the SDSL.
Definition: fm_index.hpp:210
Provides seqan3::debug_stream and related types.
Provides seqan3::dna4, container aliases and string literals.
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:37
The SeqAn namespace for literals.
Meta-header for the Search / FM Index submodule .
- Attention
- When building an index for a single text over any alphabet, the symbol with rank 255 is reserved and may not occur in the text.
Here is an example using a collection of strings (e.g. a genome with multiple chromosomes or a protein database):
int main()
{
"TAGCTGAAGCCATTGGCATCTGATCGGACT"_dna4,
"ACTGAGCTCGTC"_dna4,
"TGCATGCACCCATCGACTGACTG"_dna4,
"GTACGTACGTTACG"_dna4};
auto cur =
index.cursor();
cur.extend_right("CTGA"_dna4);
for (auto && pos : cur.locate())
return 0;
}
- Attention
- When building an index for a text collection over any alphabet, the symbols with rank 254 and 255 are reserved and may not be used in the text.
Choosing an index implementation
The underlying implementation of the FM Index (rank data structure, sampling rates, etc.) can be specified by passing a new SDSL index type as second template parameter:
- Todo:
- Link to SDSL documentation or write our own once SDSL3 documentation is available somewhere....