|
| 1 | +# Record FAQ {#record_faq} |
| 2 | + |
| 3 | +Records in B.I.O. are of implemented as a specialisation of the bio::record template.¹ |
| 4 | +This behaves very similar to a std::tuple with the difference that a bio::field identifier is associated with every |
| 5 | +element and a corresponding member function is provided, so you can easily access the elements without knowing the order. |
| 6 | + |
| 7 | +<small>¹ With the exception of bio::plain_io which uses bio::plain_io::record.</small> |
| 8 | + |
| 9 | +[TOC] |
| 10 | + |
| 11 | +\note This page contains details on how records are defined. It is meant to provide a better understanding of the design and performance implications. We recommend starting with the snippets shown in the API (e.g. bio::seq_io::reader, bio::var_io::reader, …) and only return to this page if you have questions or want to fine-tune things. |
| 12 | + |
| 13 | +## What is the full type of my record? {#record_type} |
| 14 | + |
| 15 | +Most records you interact with are produced by readers. |
| 16 | + |
| 17 | +\snippet test/snippet/seq_io/seq_io_reader.cpp simple_usage_file |
| 18 | + |
| 19 | +In this example, `rec` is the record and with each iteration of the loop, a new record is generated from the file. The exact type of the record depends on the reader. In the above example, it is: |
| 20 | + |
| 21 | +\snippet test/snippet/seq_io/seq_io_reader.cpp simple_usage_file_type |
| 22 | + |
| 23 | +That is quite long and difficulat to remember (even though definitions of X* and Y* are omitted here), |
| 24 | +so we write `auto &` instead. |
| 25 | +But it is important to know which fields are contained in the record (in this case ID, SEQ and QUAL). |
| 26 | +The documentation for the reader will tell you this, e.g. bio::seq_io::reader. |
| 27 | + |
| 28 | +## How can I access the fields? |
| 29 | + |
| 30 | +The easiest way to access a field, is by calling the respective member function: |
| 31 | + |
| 32 | +\snippet test/snippet/seq_io/seq_io_reader.cpp simple_usage_file |
| 33 | + |
| 34 | +Here, `.id()` (bio::record#id()) and `.seq()` (bio::record#seq()) are used to access the fields. Note, that the |
| 35 | +documentation has entries for all field-accessor member functions, but it depends on the specific specialisation |
| 36 | +(used by the reader) whether that function is available. |
| 37 | +So, on the record defined by bio::seq_io::reader above, the members `.id()`, `.seq()`, `.qual()` are available, but |
| 38 | +the member `.pos()` would not be. |
| 39 | + |
| 40 | +When the number of fields in the record is low and you know the order, you can also use |
| 41 | +[structured bindings](https://en.cppreference.com/w/cpp/language/structured_binding) |
| 42 | +to decompose the record into its fields: |
| 43 | + |
| 44 | +\snippet test/snippet/seq_io/seq_io_reader.cpp decomposed |
| 45 | + |
| 46 | +Note that the order of the fields is fixed (in this case it is defined by bio::seq_io::default_field_ids). |
| 47 | +It is independent of the names you give to the bindings, so this syntax is error-prone when used with large records |
| 48 | +(e.g. those defined by bio::var_io::reader). |
| 49 | + |
| 50 | +In generic contexts, you can also access fields via `get<0>(rec)` (returns the 0-th field in the record) or |
| 51 | +`get<bio::field::id>(rec)` (the same as calling `rec.id()`); but most users will never need this. |
| 52 | + |
| 53 | + |
| 54 | +## Does my record own the data? (Shallow vs deep records) {#shallow_vs_deep} |
| 55 | + |
| 56 | +As shown above, every field has an identifier (e.g. bio::field::id) and a type (e.g. std::string_view). |
| 57 | +You may have wondered, why std::string_view is used as a type and what these `transform_view`s are. |
| 58 | +These imply that the record is a *shallow* data structure, i.e. the fields *appear* like strings or vectors, but they |
| 59 | +are implemented more like references or pointers. |
| 60 | +See the SeqAn3 documentation for an in-depth [Tutorial on Ranges and Views](http://docs.seqan.de/seqan/3-master-user/tutorial_ranges.html). |
| 61 | + |
| 62 | +Shallow records imply fewer memory allocations and/or copy operations during reading. This results in a **better |
| 63 | +performance** but also in some important limitations: |
| 64 | + |
| 65 | +* Shallow records cannot be modified (as easily²). |
| 66 | +* Shallow records cannot be "stored"; they depend on internal caches and buffers of the reader and become invalid |
| 67 | +as soon as the next record is read from the file. |
| 68 | + |
| 69 | + |
| 70 | +If you need to change a record in-place and/or "store" the record for longer than one iteration of the reader, you need to use *deep records* instead. |
| 71 | +You can tell the reader that you want deep records by providing the respective options: |
| 72 | + |
| 73 | +\snippet test/snippet/seq_io/seq_io_reader.cpp options2 |
| 74 | + |
| 75 | +This snippet behaves similar to the previous one, except that the type of `rec` is now the following: |
| 76 | + |
| 77 | +\snippet test/snippet/seq_io/seq_io_reader.cpp options2_type |
| 78 | + |
| 79 | +This allows you to call std::vector's `.push_back()` member function (which is not possible in the default case). |
| 80 | +Creating this kind of record is likely a bit slower than the shallow record. |
| 81 | + |
| 82 | +**Summary** |
| 83 | + |
| 84 | +* The records generated by readers are *shallow* by default. |
| 85 | +* This setting has the best performance; but it is less flexible than a *deep* record. |
| 86 | +* Readers can be configured to produce *deep* records via the options. |
| 87 | + |
| 88 | +<small>² Some modifying operations are possible on views, too, but this depends on the exact types.</small> |
| 89 | + |
| 90 | +## How can I change the field types? |
| 91 | + |
| 92 | +In the previous section, we showed how to change the field types from being shallow to deep. |
| 93 | +For some readers, more options are available, e.g. bio::seq_io::reader assumes nucleotide data for the SEQ field by default, but you might want to read protein data instead. |
| 94 | + |
| 95 | +\snippet test/snippet/seq_io/seq_io_reader.cpp options |
| 96 | + |
| 97 | +The snippet above illustrates how the alphabet can be changed (and how to provide another option at the same time). |
| 98 | + |
| 99 | +Instead of using these pre-defined `field_types`, you can also define them completely manually. You can decide to even read only a subset of the fields by changing the `.field_ids` member: |
| 100 | + |
| 101 | +\snippet test/snippet/seq_io/seq_io_reader_options.cpp example_advanced2 |
| 102 | + |
| 103 | +This code makes FASTA the only legal format and creates records with only the sequence field asa std::string. |
| 104 | + |
| 105 | +But you can also use this mechanism to make some fields shallow and other fields deep. It also allows |
| 106 | +to choose different container types. |
| 107 | +See the API documentation of the respective `reader_options` for more advanced use-cases and the |
| 108 | +exact restrictions on allowed types. |
| 109 | + |
| 110 | +## How can I create record variables? |
| 111 | + |
| 112 | +There are various easy ways to create a bio::record that do not involve manually providing the template arguments: |
| 113 | + |
| 114 | +1. Deduce from the reader. |
| 115 | +2. Use an alias. |
| 116 | +3. Use bio::make_record or bio::tie_record. |
| 117 | + |
| 118 | +### Deduce from the reader {#record_type_from_reader} |
| 119 | + |
| 120 | +When iterating over a reader, it is easy to use `auto &` to deduce the record type, but sometimes you need |
| 121 | +the record type outside of the for-loop or in a separate context. |
| 122 | + |
| 123 | +This snippet demonstrates how to read an interleaved FastQ file and process the read pairs together (at every second iteration of the loop): |
| 124 | + |
| 125 | +\snippet test/snippet/detail/reader_base.cpp read_pair_processing |
| 126 | + |
| 127 | +To to this, you need to use deep records, because shallow records become invalid after the loop iteration. |
| 128 | +Note how it is possible to "ask" the reader for the type of its record to create the local variable. |
| 129 | + |
| 130 | +### Record type aliases {#record_aliases} |
| 131 | + |
| 132 | +When writing a file without reading a file previously, you can use one of the predefined aliases: |
| 133 | + |
| 134 | +* bio::var_io::default_record |
| 135 | + |
| 136 | +This longer example illustrates using an alias: |
| 137 | + |
| 138 | +\snippet test/snippet/var_io/var_io_writer.cpp creation |
| 139 | +\snippet test/snippet/var_io/var_io_writer.cpp simple_usage_file |
| 140 | + |
| 141 | +Here bio::var_io::default_record is the type that a bio::var_io::reader would generate if it is defined without any options, **except that the alias is deep by default.** |
| 142 | +This is based on the assumption that aliases are typically used to define local variables whose values you want to change. |
| 143 | + |
| 144 | +### Making and tying records {#record_make_tie} |
| 145 | + |
| 146 | +There are convenience functions for making and tying records, similar to std::make_tuple and std::tie: |
| 147 | +\snippet test/snippet/record.cpp make_and_tie_record |
| 148 | + |
| 149 | +The type of rec1 is: |
| 150 | +\snippet test/snippet/record.cpp make_and_tie_record_type_rec1 |
| 151 | + |
| 152 | +The type of rec2 is: |
| 153 | +\snippet test/snippet/record.cpp make_and_tie_record_type_rec2 |
| 154 | + |
| 155 | +When creating a record from existing variables, you can use bio::tie_record to avoid needless copies. |
| 156 | +Instead of manually entering the identifiers as a bio::vtag, you can use bio::seq_io::default_field_ids (or the respective defaults of another reader/writer). |
0 commit comments