Skip to content

SequenceIndex: Make indexing_differences_to_reference_sequence specific to vertical_tile_index #1053

@Taepper

Description

@Taepper

Currently, per position we index differences to the "reference", where the "reference" does not need to be equal to the real reference sequence, but will be chosen dynamically to adapt to local sequence variation. This improves compression for prevalent mutations.

Ideally, this adaption of indexing should be more fine-grained to provide comparable compression performance to tree-based compression while retaining the speed of columnar indexes.

An appropriate granularity might be the vertical_tile_index as introduced in #993. This would imply that the for a given position, the reference symbol might change every 2^16 sequences. When ordering closely related sequences (correlated evolutionary heritage) together, this should improve compression even further

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions