Provides files and formats for handling read mapping data. More...
Classes | |
struct | seqan3::detail::access_restrictor_fn |
A functor that always throws when calling operator() (needed for the alignment "dummy" sequence). More... | |
class | seqan3::format_bam |
The BAM format. More... | |
class | seqan3::format_sam |
The SAM format (tag). More... | |
class | seqan3::detail::format_sam_base |
The alignment base format. More... | |
struct | seqan3::ref_info_not_given |
Type tag which indicates that no reference information has been passed to the alignment file on construction. More... | |
class | seqan3::sam_file_header< ref_ids_type > |
Stores the header information of alignment files. More... | |
class | seqan3::sam_file_input< traits_type_, selected_field_ids_, valid_formats_ > |
A class for reading alignment files, e.g. SAM, BAM, BLAST ... More... | |
struct | seqan3::sam_file_input_default_traits< ref_sequences_t, ref_ids_t > |
The default traits for seqan3::sam_file_input. More... | |
interface | sam_file_input_format |
The generic concept for alignment file input formats. More... | |
struct | seqan3::detail::sam_file_input_format_exposer< format_type > |
Internal class used to expose the actual format interface to read alignment records from the file. More... | |
struct | seqan3::sam_file_input_options< sequence_legal_alphabet > |
The options type defines various option members that influence the behaviour of all or some formats. More... | |
interface | sam_file_input_traits |
The requirements a traits_type for seqan3::sam_file_input must meet. More... | |
class | seqan3::sam_file_output< selected_field_ids_, valid_formats_, ref_ids_type > |
A class for writing alignment files, e.g. SAM, BAL, BLAST, ... More... | |
interface | sam_file_output_format |
The generic concept for alignment file out formats. More... | |
struct | seqan3::detail::sam_file_output_format_exposer< format_type > |
Internal class used to expose the actual format interface to write alignment records into the file. More... | |
struct | seqan3::sam_file_output_options |
The options type defines various option members that influence the behavior of all or some formats. More... | |
class | seqan3::sam_tag_dictionary |
The SAM tag dictionary class that stores all optional SAM fields. More... | |
struct | seqan3::sam_tag_type< tag_value > |
The generic base class. More... | |
struct | seqan3::detail::view_equality_fn |
Comparator that is able two compare two views. More... | |
Typedefs | |
using | seqan3::detail::sam_tag_variant = std::variant< char, int32_t, float, std::string, std::vector< std::byte >, std::vector< int8_t >, std::vector< uint8_t >, std::vector< int16_t >, std::vector< uint16_t >, std::vector< int32_t >, std::vector< uint32_t >, std::vector< float > > |
std::variant of allowed types for optional tag fields of the SAM format. | |
Enumerations | |
enum class | seqan3::sam_flag : uint16_t { seqan3::sam_flag::none = 0 , seqan3::sam_flag::paired = 0x1 , seqan3::sam_flag::proper_pair = 0x2 , seqan3::sam_flag::unmapped = 0x4 , seqan3::sam_flag::mate_unmapped = 0x8 , seqan3::sam_flag::on_reverse_strand = 0x10 , seqan3::sam_flag::mate_on_reverse_strand = 0x20 , seqan3::sam_flag::first_in_pair = 0x40 , seqan3::sam_flag::second_in_pair = 0x80 , seqan3::sam_flag::secondary_alignment = 0x100 , seqan3::sam_flag::failed_filter = 0x200 , seqan3::sam_flag::duplicate = 0x400 , seqan3::sam_flag::supplementary_alignment = 0x800 } |
An enum flag that describes the properties of an aligned read (given as a SAM record). More... | |
Functions | |
template<seqan3::detail::writable_pairwise_alignment alignment_type> | |
void | seqan3::detail::alignment_from_cigar (alignment_type &alignment, std::vector< cigar > const &cigar_vector) |
Transforms a std::vector of operation-count pairs (representing the cigar string). More... | |
template<seqan3::detail::pairwise_alignment alignment_type> | |
std::string | seqan3::detail::get_cigar_string (alignment_type &&alignment, uint32_t const query_start_pos=0, uint32_t const query_end_pos=0, bool const extended_cigar=false) |
Creates a cigar string (SAM format) given a seqan3::detail::pairwise_alignment. More... | |
template<seqan3::aligned_sequence ref_seq_type, seqan3::aligned_sequence query_seq_type> | |
std::string | seqan3::detail::get_cigar_string (ref_seq_type &&ref_seq, query_seq_type &&query_seq, uint32_t const query_start_pos=0, uint32_t const query_end_pos=0, bool const extended_cigar=false) |
Transforms an alignment represented by two seqan3::aligned_sequence's into the corresponding cigar string. More... | |
std::string | seqan3::detail::get_cigar_string (std::vector< cigar > const &cigar_vector) |
Transforms a vector of cigar elements into a string representation. More... | |
template<seqan3::detail::pairwise_alignment alignment_type> | |
std::vector< cigar > | seqan3::detail::get_cigar_vector (alignment_type &&alignment, uint32_t const query_start_pos=0, uint32_t const query_end_pos=0, bool const extended_cigar=false) |
Creates a cigar string (SAM format) given a seqan3::detail::pairwise_alignment represented by two seqan3::aligned_sequence's. More... | |
template<typename reference_char_type , typename query_char_type > | |
constexpr cigar::operation | seqan3::detail::map_aligned_values_to_cigar_op (reference_char_type const reference_char, query_char_type const query_char, bool const extended_cigar) |
Compares two seqan3::aligned_sequence values and returns their cigar operation. More... | |
template<typename cigar_input_type > | |
std::tuple< std::vector< cigar >, int32_t, int32_t > | seqan3::detail::parse_cigar (cigar_input_type &&cigar_input) |
Parses a cigar string into a vector of operation-count pairs (e.g. (M, 3)). More... | |
void | seqan3::detail::update_alignment_lengths (int32_t &ref_length, int32_t &seq_length, char const cigar_operation, uint32_t const cigar_count) |
Updates the sequence lengths by cigar_count depending on the cigar operation op . More... | |
Variables | |
template<> | |
constexpr bool | seqan3::add_enum_bitwise_operators< sam_flag > = true |
Enables bitwise operations for seqan3::sam_flags. More... | |
template<typename t > | |
constexpr bool | seqan3::detail::is_type_list_of_sam_file_input_formats_v = false |
Auxiliary value metafuncton that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_input_format [default is false]. More... | |
template<typename ... ts> | |
constexpr bool | seqan3::detail::is_type_list_of_sam_file_input_formats_v< type_list< ts... > > = (sam_file_input_format<ts> && ...) |
Auxiliary value metafuncton that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_input_format [overload]. More... | |
template<typename t > | |
constexpr bool | seqan3::detail::is_type_list_of_sam_file_output_formats_v = false |
Auxiliary value metafuncton that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_output_format [default is false]. More... | |
template<typename ... ts> | |
constexpr bool | seqan3::detail::is_type_list_of_sam_file_output_formats_v< type_list< ts... > > = (sam_file_output_format<ts> && ...) |
Auxiliary value metafuncton that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_output_format [overload]. More... | |
constexpr char | seqan3::detail::sam_tag_type_char [12] = {'A', 'i', 'f', 'Z', 'H', 'B', 'B', 'B', 'B', 'B', 'B', 'B'} |
Each SAM tag type char identifier. Index corresponds to the seqan3::detail::sam_tag_variant types. | |
constexpr char | seqan3::detail::sam_tag_type_char_extra [12] = {'\0', '\0', '\0', '\0', '\0', 'c', 'C', 's', 'S', 'i', 'I', 'f'} |
Each types SAM tag type extra char id. Index corresponds to the seqan3::detail::sam_tag_variant types. | |
template<typename t > | |
SEQAN3_CONCEPT | seqan3::detail::type_list_of_sam_file_input_formats = is_type_list_of_sam_file_input_formats_v<t> |
Auxiliary concept that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_input_format. More... | |
template<typename t > | |
SEQAN3_CONCEPT | seqan3::detail::type_list_of_sam_file_output_formats = is_type_list_of_sam_file_output_formats_v<t> |
Auxiliary concept that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_output_format. More... | |
Other literals | |
template<typename char_t , char_t ... s> | |
constexpr uint16_t | seqan3::literals::operator""_tag () |
The SAM tag literal, such that tags can be used in constant expressions. More... | |
template<typename char_t , char_t ... s> | |
constexpr uint16_t | operator""_tag () |
The SAM tag literal, such that tags can be used in constant expressions. More... | |
Provides files and formats for handling read mapping data.
SAM/BAM files are primarily used to store pairwise alignments of read mapping data.
The SAM file abstraction supports reading 12 different fields:
There exists one more field for SAM files, the seqan3::field::header_ptr, but this field is mostly used internally. Please see the seqan3::sam_file_output::header member function for details on how to access the seqan3::sam_file_header of the file.
All of these fields are retrieved by default (and in that order).
Please see the corresponding formats for more details.
|
strong |
An enum flag that describes the properties of an aligned read (given as a SAM record).
The SAM flag are bitwise flags, which means that each value corresponds to a specific bit that is set and that they can be combined and tested using binary operations. See this tutorial for an introduction on bitwise operations on enum flags.
Example:
Adapted from the SAM specifications are the following additional information to some flag values:
FLAG & 0x900 == 0
). This line is called the primary alignment of the read.0x100
) marks the alignment not to be used in certain analyses when the tools in use are aware of this bit. It is typically used to flag alternative mappings when multiple mappings are presented in a SAM.0x800
) indicates that the corresponding alignment line is part of a chimeric alignment. If the SAM/BAM file corresponds to long reads (nanopore/pacbio) this happens when reads are split before being aligned and the best matching part is marked as primary, while all other aligned parts are marked supplementary.0x4
) is the only reliable place to tell whether the read is unmapped. If seqan3::sam_flag::unmapped is set, no assumptions can be made about RNAME, POS, CIGAR, MAPQ, and seqan3::sam_flag::proper_pair, seqan3::sam_flag::secondary_alignment, and seqan3::sam_flag::supplementary_alignment (bits 0x2
, 0x100
, and 0x800
).0x10
) indicates whether the read sequence has been reverse complemented and the quality string is reversed. When bit seqan3::sam_flag::unmapped (0x4
) is unset, this corresponds to the strand to which the segment has been mapped: seqan3::sam_flag::on_reverse_strand (bit 0x10
) unset indicates the forward strand, while set indicates the reverse strand. When seqan3::sam_flag::unmapped (0x4
) is set, this indicates whether the unmapped read is stored in its original orientation as it came off the sequencing machine.0x40
and 0x80
) reflect the read ordering within each template inherent in the sequencing technology used. If seqan3::sam_flag::first_in_pair and seqan3::sam_flag::second_in_pair (0x40
and 0x80
) are both set, the read is part of a linear template, but it is neither the first nor the last read. If both are unset, the index of the read in the template is unknown. This may happen for a non-linear template or when this information is lost during data processing.0x1
) is unset, no assumptions can be made about seqan3::sam_flag::proper_pair, seqan3::sam_flag::mate_unmapped, seqan3::sam_flag::mate_on_reverse_strand, seqan3::sam_flag::first_in_pair and seqan3::sam_flag::second_in_pair (bits 0x2
, 0x8
, 0x20
, 0x40
and 0x80
).Enumerator | |
---|---|
none | None of the flags below are set. |
paired | The aligned read is paired (paired-end sequencing). |
proper_pair | The two aligned reads in a pair have a proper distance between each other. |
unmapped | The read is not mapped to a reference (unaligned). |
mate_unmapped | The mate of this read is not mapped to a reference (unaligned). |
on_reverse_strand | The read sequence has been reverse complemented before being mapped (aligned). |
mate_on_reverse_strand | The mate sequence has been reverse complemented before being mapped (aligned). |
first_in_pair | Indicates the ordering (see details in the seqan3::sam_flag description). |
second_in_pair | Indicates the ordering (see details in the seqan3::sam_flag description). |
secondary_alignment | This read alignment is an alternative (possibly suboptimal) to the primary. |
failed_filter | The read alignment failed a filter, e.g. quality controls. |
duplicate | The read is marked as a PCR duplicate or optical duplicate. |
supplementary_alignment | This sequence is part of a split alignment and is not the primary alignment. |
|
inline |
Transforms a std::vector of operation-count pairs (representing the cigar string).
alignment_type | The type of alignment; must model seqan3::detail::writable_pairwise_alignment. |
[in,out] | alignment | The alignment to fill with gaps according to the cigar information. |
[in] | cigar_vector | The cigar information given as a std::vector over seqan3::cigar. |
Given the following cigar string "4M2I5M2D1M", the cigar information extracted by seqan3::detail::parse_cigar would be "[(M,4), (I,2), (M,5), (D,2), (M,1)]". Given those cigar information, and an alignment variable containing the two unaligned sequences "(ATGGCGTAGAGC, ATGCCCCGTTGC)", the alignment will be filled with the following gaps:
|
inline |
Creates a cigar string (SAM format) given a seqan3::detail::pairwise_alignment.
alignment_type | Must model seqan3::detail::pairwise_alignment. |
alignment | The alignment, represented by a seqan3::pair_like of seqan3::aligned_sequence's, to be transformed into cigar vector based on the second (query) sequence. |
query_start_pos | The start position of the alignment in the query sequence indicating soft-clipping. |
query_end_pos | The end position of the alignment in the query sequence indicating soft-clipping. |
extended_cigar | Whether to print the extended cigar alphabet or not. See cigar operation. |
alignment
pair.The following alignment reference sequence on top and the query sequence at the bottom.
In this case, the function seqan3::detail::get_cigar_string will return the following cigar string when printed: "4M2I5M2D1M". The extended cigar string would look like this: "3=1X2I3=1X1=2D1=".
|
inline |
Transforms an alignment represented by two seqan3::aligned_sequence's into the corresponding cigar string.
ref_seq_type | Must model seqan3::aligned_sequence. |
query_seq_type | Must model seqan3::aligned_sequence. |
ref_seq | The reference sequence to compare against the query sequence. |
query_seq | The query sequence to build the cigar string for. |
query_start_pos | The start position of the alignment in the query sequence indicating soft-clipping. |
query_end_pos | The end position of the alignment in the query sequence indicating soft-clipping. |
extended_cigar | Whether to print the extended cigar alphabet or not. See cigar operation. |
query_seq
).The following alignment reference sequence on top and the query sequence at the bottom.
In this case, the function seqan3::detail::get_cigar_string will return the following cigar string when printed: "4M2I5M2D1M". The extended cigar string would look like this: "3=1X2I3=1X1=2D1=".
|
inline |
Transforms a vector of cigar elements into a string representation.
cigar_vector | The std::vector of seqan3::cigar elements to be transformed into a std::string. |
|
inline |
Creates a cigar string (SAM format) given a seqan3::detail::pairwise_alignment represented by two seqan3::aligned_sequence's.
alignment_type | Must model seqan3::detail::pairwise_alignment. |
alignment | The alignment, represented by a pair of aligned sequences, to be transformed into cigar vector based on the second (query) sequence. |
query_start_pos | The start position of the alignment in the query sequence indicating soft-clipping. |
query_end_pos | The end position of the alignment in the query sequence indicating soft-clipping. |
extended_cigar | Whether to print the extended cigar alphabet or not. See cigar operation. |
alignment
pair.Given the following alignment reference sequence on top and the query sequence at the bottom:
In this case, the function seqan3::detail::get_cigar_vector will return the following cigar vector: "[('M',4),('I',2),('M',5),('D',2),('M',1)]". The extended cigar string would look like this: "[('=',3)('X',1)('I',2)('=',3)('X',1)('=',1)('D',2)('=',1)]".
|
constexpr |
Compares two seqan3::aligned_sequence values and returns their cigar operation.
reference_char_type | Must be equality comparable to seqan3::gap. |
query_char_type | Must be equality comparable to seqan3::gap. |
reference_char | The aligned character of the reference to compare. |
query_char | The aligned character of the query to compare. |
extended_cigar | Whether to print the extended cigar alphabet or not. See cigar operation. |
query_char
).The following alignment column shows the reference char ('C') on top and a gap for the query char at the bottom.
In this case, the function seqan3::detail::map_aligned_values_to_cigar_op will return 'D' since the query char is "deleted".
The next alignment column shows the reference char ('C') on top and a query char ('G') at the bottom.
In this case, the function seqan3::detail::map_aligned_values_to_cigar_op will return 'M', for the basic cigar the two bases are aligned, while in the extended cigar alphabet (extended_cigar
= true
) the function will return an 'X' since the bases are aligned but are not equal.
|
constexpr |
The SAM tag literal, such that tags can be used in constant expressions.
char_t | The char type. Usually char . Parameter pack ...s must be of length 2 since SAM tags consist of two letters (char0 and char1). |
A SAM tag consists of two letters, initialized via the string literal ""_tag, which delegate to its unique id.
The purpose of those tags is to fill or query the seqan3::sam_tag_dictionary for a specific key (tag_id) and retrieve the corresponding value.
|
related |
The SAM tag literal, such that tags can be used in constant expressions.
char_t | The char type. Usually char . Parameter pack ...s must be of length 2 since SAM tags consist of two letters (char0 and char1). |
A SAM tag consists of two letters, initialized via the string literal ""_tag, which delegate to its unique id.
The purpose of those tags is to fill or query the seqan3::sam_tag_dictionary for a specific key (tag_id) and retrieve the corresponding value.
|
inline |
Parses a cigar string into a vector of operation-count pairs (e.g. (M, 3)).
cigar_input_type | The type of a single pass input view over the cigar string; must model std::ranges::input_range. |
[in] | cigar_input | The single pass input view over the cigar string to parse. |
For example, the view over the cigar string "1H4M1D2M2S" will return {[(H,1), (M,4), (D,1), (M,2), (S,2)], 7, 6}
.
|
inline |
Updates the sequence lengths by cigar_count
depending on the cigar operation op
.
[in,out] | ref_length | The reference sequence's length. |
[in,out] | seq_length | The query sequence's length. |
[in] | cigar_operation | The cigar operation. |
[in] | cigar_count | The cigar count value to add to the length depending on the cigar operation. |
|
constexpr |
Enables bitwise operations for seqan3::sam_flags.
|
constexpr |
Auxiliary value metafuncton that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_input_format [default is false].
|
constexpr |
Auxiliary value metafuncton that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_input_format [overload].
|
constexpr |
Auxiliary value metafuncton that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_output_format [default is false].
|
constexpr |
Auxiliary value metafuncton that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_output_format [overload].
SEQAN3_CONCEPT seqan3::detail::type_list_of_sam_file_input_formats = is_type_list_of_sam_file_input_formats_v<t> |
Auxiliary concept that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_input_format.
SEQAN3_CONCEPT seqan3::detail::type_list_of_sam_file_output_formats = is_type_list_of_sam_file_output_formats_v<t> |
Auxiliary concept that checks whether a type is a seqan3::type_list and all types meet seqan3::sam_file_output_format.