Currently, per position we index differences to the "reference", where the "reference" does not need to be equal to the real reference sequence, but will be chosen dynamically to adapt to local sequence variation. This improves compression for prevalent mutations.
Ideally, this adaption of indexing should be more fine-grained to provide comparable compression performance to tree-based compression while retaining the speed of columnar indexes.
An appropriate granularity might be the vertical_tile_index as introduced in #993. This would imply that the for a given position, the reference symbol might change every 2^16 sequences. When ordering closely related sequences (correlated evolutionary heritage) together, this should improve compression even further