SeqAn3 3.1.0-rc.2
The Modern C++ library for sequence analysis.
SAM File

Provides files and formats for handling read mapping data. More...

+ Collaboration diagram for SAM File:

Classes

class  seqan3::format_bam
 The BAM format. More...
 
class  seqan3::format_sam
 The SAM format (tag). More...
 
struct  seqan3::ref_info_not_given
 Type tag which indicates that no reference information has been passed to the alignment file on construction. More...
 
class  seqan3::sam_file_header< ref_ids_type >
 Stores the header information of alignment files. More...
 
class  seqan3::sam_file_input< traits_type_, selected_field_ids_, valid_formats_ >
 A class for reading alignment files, e.g. SAM, BAM, BLAST ... More...
 
struct  seqan3::sam_file_input_default_traits< ref_sequences_t, ref_ids_t >
 The default traits for seqan3::sam_file_input. More...
 
interface  sam_file_input_format
 The generic concept for alignment file input formats. More...
 
struct  seqan3::sam_file_input_options< sequence_legal_alphabet >
 The options type defines various option members that influence the behaviour of all or some formats. More...
 
interface  sam_file_input_traits
 The requirements a traits_type for seqan3::sam_file_input must meet. More...
 
class  seqan3::sam_file_output< selected_field_ids_, valid_formats_, ref_ids_type >
 A class for writing alignment files, e.g. SAM, BAL, BLAST, ... More...
 
interface  sam_file_output_format
 The generic concept for alignment file out formats. More...
 
struct  seqan3::sam_file_output_options
 The options type defines various option members that influence the behavior of all or some formats. More...
 
class  seqan3::sam_tag_dictionary
 The SAM tag dictionary class that stores all optional SAM fields. More...
 
struct  seqan3::sam_tag_type< tag_value >
 The generic base class. More...
 

Enumerations

enum class  seqan3::sam_flag : uint16_t {
  seqan3::sam_flag::none = 0 , seqan3::sam_flag::paired = 0x1 , seqan3::sam_flag::proper_pair = 0x2 , seqan3::sam_flag::unmapped = 0x4 ,
  seqan3::sam_flag::mate_unmapped = 0x8 , seqan3::sam_flag::on_reverse_strand = 0x10 , seqan3::sam_flag::mate_on_reverse_strand = 0x20 , seqan3::sam_flag::first_in_pair = 0x40 ,
  seqan3::sam_flag::second_in_pair = 0x80 , seqan3::sam_flag::secondary_alignment = 0x100 , seqan3::sam_flag::failed_filter = 0x200 , seqan3::sam_flag::duplicate = 0x400 ,
  seqan3::sam_flag::supplementary_alignment = 0x800
}
 An enum flag that describes the properties of an aligned read (given as a SAM record). More...
 

Other literals

template<typename char_t , char_t ... s>
constexpr uint16_t seqan3::literals::operator""_tag ()
 The SAM tag literal, such that tags can be used in constant expressions. More...
 
template<typename char_t , char_t ... s>
constexpr uint16_t operator""_tag ()
 The SAM tag literal, such that tags can be used in constant expressions. More...
 

Detailed Description

Provides files and formats for handling read mapping data.

Introduction

SAM/BAM files are primarily used to store pairwise alignments of read mapping data.

Note
For a step-by-step guide take a look at our tutorial: SAM Input and Output in SeqAn.

The SAM file abstraction supports reading 12 different fields:

  1. seqan3::field::seq
  2. seqan3::field::id
  3. seqan3::field::offset
  4. seqan3::field::ref_id
  5. seqan3::field::ref_offset
  6. seqan3::field::alignment
  7. seqan3::field::cigar
  8. seqan3::field::mapq
  9. seqan3::field::qual
  10. seqan3::field::flag
  11. seqan3::field::mate
  12. seqan3::field::tags

There exists one more field for SAM files, the seqan3::field::header_ptr, but this field is mostly used internally. Please see the seqan3::sam_file_output::header member function for details on how to access the seqan3::sam_file_header of the file.

All of these fields are retrieved by default (and in that order).

Please see the corresponding formats for more details.

Enumeration Type Documentation

◆ sam_flag

enum class seqan3::sam_flag : uint16_t
strong

An enum flag that describes the properties of an aligned read (given as a SAM record).

See also
seqan3::enum_bitwise_operators enables combining enum values.

The SAM flag are bitwise flags, which means that each value corresponds to a specific bit that is set and that they can be combined and tested using binary operations. See this tutorial for an introduction on bitwise operations on enum flags.

Example:

#include <iostream>
#include <sstream>
auto sam_file_raw = R"(@HD VN:1.6 SO:coordinate GO:none
@SQ SN:ref LN:45
r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG !!!!!!!!!!!!!!!!!
r003 0 ref 29 30 5S6M * 0 0 GCCTAAGCTAA !!!!!!!!!!! SA:Z:ref,29,-,6H5M,17,0;
r003 4 * 29 17 * * 0 0 TAGGC @@@@@ SA:Z:ref,9,+,5S6M,30,1;
r001 147 ref 237 30 9M = 7 -39 CAGCGGCAT !!!!!!!!! NM:i:1
)";
int main()
{
for (auto & rec : fin)
{
// Check if a certain flag value (bit) is set:
if (static_cast<bool>(rec.flag() & seqan3::sam_flag::unmapped))
std::cout << "Read " << rec.id() << " is unmapped\n";
if (rec.base_qualities()[0] < seqan3::assign_char_to('@', seqan3::phred42{})) // low quality
{
// Set a flag value (bit):
// Note that this does not affect other flag values (bits),
// e.g. `rec.flag() & seqan3::sam_flag::unmapped` may still be true
}
// Unset a flag value (bit):
rec.flag() &= ~seqan3::sam_flag::duplicate; // not marked as a duplicate anymore
}
}
The SAM format (tag).
Definition: format_sam.hpp:115
Quality type for traditional Sanger and modern Illumina Phred scores..
Definition: phred42.hpp:47
A class for reading alignment files, e.g. SAM, BAM, BLAST ...
Definition: input.hpp:350
constexpr auto assign_char_to
Assign a character to an alphabet object.
Definition: concept.hpp:523
@ failed_filter
The read alignment failed a filter, e.g. quality controls.
@ unmapped
The read is not mapped to a reference (unaligned).
Meta-header for the IO / SAM File submodule .
The main SeqAn3 namespace.
Definition: cigar_operation_table.hpp:2

Adapted from the SAM specifications are the following additional information to some flag values:

See also
https://broadinstitute.github.io/picard/explain-flags.html
Enumerator
none 

None of the flags below are set.

paired 

The aligned read is paired (paired-end sequencing).

proper_pair 

The two aligned reads in a pair have a proper distance between each other.

unmapped 

The read is not mapped to a reference (unaligned).

mate_unmapped 

The mate of this read is not mapped to a reference (unaligned).

on_reverse_strand 

The read sequence has been reverse complemented before being mapped (aligned).

mate_on_reverse_strand 

The mate sequence has been reverse complemented before being mapped (aligned).

first_in_pair 

Indicates the ordering (see details in the seqan3::sam_flag description).

second_in_pair 

Indicates the ordering (see details in the seqan3::sam_flag description).

secondary_alignment 

This read alignment is an alternative (possibly suboptimal) to the primary.

failed_filter 

The read alignment failed a filter, e.g. quality controls.

duplicate 

The read is marked as a PCR duplicate or optical duplicate.

supplementary_alignment 

This sequence is part of a split alignment and is not the primary alignment.

Function Documentation

◆ operator""_tag() [1/2]

template<typename char_t , char_t ... s>
constexpr uint16_t seqan3::literals::operator""_tag ( )
constexpr

The SAM tag literal, such that tags can be used in constant expressions.

Template Parameters
char_tThe char type. Usually char. Parameter pack ...s must be of length 2 since SAM tags consist of two letters (char0 and char1).
Returns
The unique identifier of the SAM tag computed by char0 * 128 + char1.

A SAM tag consists of two letters, initialized via the string literal ""_tag, which delegate to its unique id.

using namespace seqan3::literals;
// ...
uint16_t tag_id = "NM"_tag; // tag_id = 10061
The SeqAn namespace for literals.
Provides the seqan3::sam_tag_dictionary class and auxiliaries.

The purpose of those tags is to fill or query the seqan3::sam_tag_dictionary for a specific key (tag_id) and retrieve the corresponding value.

◆ operator""_tag() [2/2]

template<typename char_t , char_t ... s>
constexpr uint16_t operator""_tag ( )
related

The SAM tag literal, such that tags can be used in constant expressions.

Template Parameters
char_tThe char type. Usually char. Parameter pack ...s must be of length 2 since SAM tags consist of two letters (char0 and char1).
Returns
The unique identifier of the SAM tag computed by char0 * 128 + char1.

A SAM tag consists of two letters, initialized via the string literal ""_tag, which delegate to its unique id.

using namespace seqan3::literals;
// ...
uint16_t tag_id = "NM"_tag; // tag_id = 10061

The purpose of those tags is to fill or query the seqan3::sam_tag_dictionary for a specific key (tag_id) and retrieve the corresponding value.