SeqAn3 3.1.0-rc.2
The Modern C++ library for sequence analysis.
Views

Search related views. More...

+ Collaboration diagram for Views:

Classes

struct  seqan3::detail::kmer_hash_fn
 views::kmer_hash's range adaptor object type (non-closure). More...
 
class  seqan3::detail::kmer_hash_view< urng_t >
 The type returned by seqan3::views::kmer_hash. More...
 
struct  seqan3::detail::minimiser_fn
 seqan3::views::minimiser's range adaptor object type (non-closure). More...
 
struct  seqan3::detail::minimiser_hash_fn
 seqan3::views::minimiser_hash's range adaptor object type (non-closure). More...
 
class  seqan3::detail::minimiser_view< urng1_t, urng2_t >
 The type returned by seqan3::views::minimiser. More...
 
struct  seqan3::seed
 strong_type for seed. More...
 
struct  seqan3::window_size
 strong_type for the window_size. More...
 

Variables

constexpr auto seqan3::views::kmer_hash
 Computes hash values for each position of a range via a given shape. More...
 
constexpr auto seqan3::views::minimiser
 Computes minimisers for a range of comparable values. A minimiser is the smallest value in a window. More...
 

Detailed Description

Search related views.

See also
Search

Variable Documentation

◆ kmer_hash

constexpr auto seqan3::views::kmer_hash
inlineconstexpr

Computes hash values for each position of a range via a given shape.

Template Parameters
urng_tThe type of the range being processed. See below for requirements. [template parameter is omitted in pipe notation]
Parameters
[in]urangeThe range being processed. [parameter is omitted in pipe notation]
[in]shapeThe seqan3::shape that determines how to compute the hash value.
Returns
A range of std::size_t where each value is the hash of the resp. k-mer. See below for the properties of the returned range.
Attention
For the alphabet size \(\sigma\) of the alphabet of urange and the number of 1s \(s\) of shape it must hold that \(s>\frac{64}{\log_2\sigma}\), i.e. hashes resulting from the shape/alphabet combination can be represented in an uint64_t.

View properties

Concepts and traits urng_t (underlying range type) rrng_t (returned range type)
std::ranges::input_range required preserved
std::ranges::forward_range required preserved
std::ranges::bidirectional_range preserved
std::ranges::random_access_range preserved
std::ranges::contiguous_range lost
std::ranges::viewable_range required guaranteed
std::ranges::view guaranteed
std::ranges::sized_range preserved
std::ranges::common_range preserved
std::ranges::output_range lost
seqan3::const_iterable_range preserved
std::ranges::range_reference_t seqan3::semialphabet std::size_t

See the views submodule documentation for detailed descriptions of the view properties.

Example

using namespace seqan3::literals;
int main()
{
std::vector<seqan3::dna4> text{"ACGTAGC"_dna4};
seqan3::debug_stream << hashes << '\n'; // [6,27,44,50,9]
seqan3::debug_stream << (text | seqan3::views::kmer_hash(seqan3::ungapped{3})) << '\n'; // [6,27,44,50,9]
seqan3::debug_stream << (text | seqan3::views::kmer_hash(0b101_shape)) << '\n'; // [2,7,8,14,1]
}
A class that defines which positions of a pattern to hash.
Definition: shape.hpp:60
Provides seqan3::debug_stream and related types.
Provides seqan3::dna4, container aliases and string literals.
debug_stream_type debug_stream
A global instance of seqan3::debug_stream_type.
Definition: debug_stream.hpp:37
constexpr auto kmer_hash
Computes hash values for each position of a range via a given shape.
Definition: kmer_hash.hpp:785
Provides seqan3::views::kmer_hash.
The SeqAn namespace for literals.
A strong type of underlying type uint8_t that represents the ungapped shape size.
Definition: shape.hpp:25

This entity is stable. Since version 3.1.

◆ minimiser

constexpr auto seqan3::views::minimiser
inlineconstexpr

Computes minimisers for a range of comparable values. A minimiser is the smallest value in a window.

Template Parameters
urng_tThe type of the first range being processed. See below for requirements. [template parameter is omitted in pipe notation]
Parameters
[in]urange1The range being processed. [parameter is omitted in pipe notation]
[in]window_sizeThe number of values in one window.
Returns
A range of std::totally_ordered where each value is the minimal value for one window. See below for the properties of the returned range.

A minimiser is the smallest value in a window. For example for the following list of hash values [28, 100, 9, 23, 4, 1, 72, 37, 8] and 4 as window_size, the minimiser values are [9, 4, 1].

The minimiser can be calculated for one given range or for two given ranges, where the minimizer is the smallest value in both windows. For example for the following list of hash values [28, 100, 9, 23, 4, 1, 72, 37, 8] and [30, 2, 11, 101, 199, 73, 34, 900] and 4 as window_size, the minimiser values are [2, 4, 1].

Note that in the interface with the second underlying range the const-iterable property will only be preserved if both underlying ranges are const-iterable.

Robust Winnowing

In case there are multiple minimal values within one window, the minimum and therefore the minimiser is ambiguous. We choose the rightmost value as the minimiser of the window, and when shifting the window, the minimiser is only changed if there appears a value that is strictly smaller than the current minimum. This approach is termed robust winnowing by Chirag et al. and is proven to work especially well on repeat regions.

Example

using namespace seqan3::literals;
int main()
{
std::vector<seqan3::dna4> text{"ACGTAGC"_dna4};
seqan3::debug_stream << hashes << '\n'; // [6,27,44,50,9]
auto minimiser = hashes | seqan3::views::minimiser(4);
seqan3::debug_stream << minimiser << '\n'; // [6,9]
// kmer_hash with gaps, hashes: [2,7,8,14,1], minimiser: [2,1]
/* Minimiser view with two ranges
* The second range defines the hash values from the reverse complement, the second reverse is necessary to put the
* hash values in the correct order. For the example here:
* ACGTAGC | seqan3::views::complement => TGCATCG
* | std::views::reverse => GCTACGT
* | seqan3::views::kmer_hash(seqan3::ungapped{3}) => [39 (for GCA), 28 (for CTA), 49 (for TAC),
* 6 (for ACG), 27 (for CGT)]
* "GCA" is not the reverse complement from the first k-mer in "ACGTAGC", which is "ACG", but "CGT" is.
* Therefore, a second reverse is necessary to find the smallest value between the original sequence and its
* reverse complement.
*/
auto reverse_complement_hashes = text | seqan3::views::complement | std::views::reverse
| seqan3::views::kmer_hash(seqan3::ungapped{3}) | std::views::reverse;
seqan3::debug_stream << reverse_complement_hashes << '\n'; // [27,6,49,28,39]
auto minimiser2 = seqan3::detail::minimiser_view{hashes, reverse_complement_hashes, 4};
seqan3::debug_stream << minimiser2 << '\n'; // [6,6]
}
The type returned by seqan3::views::minimiser.
Definition: minimiser.hpp:51
Provides seqan3::views::complement.
auto const complement
A view that converts a range of nucleotides to their complement.
Definition: complement.hpp:67
constexpr auto minimiser
Computes minimisers for a range of comparable values. A minimiser is the smallest value in a window.
Definition: minimiser.hpp:585
Provides seqan3::views::minimiser.

View properties

Concepts and traits urng_t (underlying range type) rrng_t (returned range type)
std::ranges::input_range required preserved
std::ranges::forward_range required preserved
std::ranges::bidirectional_range lost
std::ranges::random_access_range lost
std::ranges::contiguous_range lost
std::ranges::viewable_range required guaranteed
std::ranges::view guaranteed
std::ranges::sized_range lost
std::ranges::common_range lost
std::ranges::output_range lost
seqan3::const_iterable_range preserved
std::ranges::range_reference_t std::totally_ordered std::totally_ordered

See the views submodule documentation for detailed descriptions of the view properties.

This entity is stable. Since version 3.1.