Skip to content

feat: add fast field support for Bytes type#2830

Merged
fulmicoton merged 1 commit intomainfrom
paradedb/byte-fast-field
Feb 11, 2026
Merged

feat: add fast field support for Bytes type#2830
fulmicoton merged 1 commit intomainfrom
paradedb/byte-fast-field

Conversation

@mdashti
Copy link
Collaborator

@mdashti mdashti commented Feb 9, 2026

What

Enable range queries and TopN sorting on Bytes fast fields, bringing them to parity with Str fields.

Why

BytesColumn uses the same dictionary encoding as StrColumn internally, but range queries and TopN sorting were explicitly disabled for Bytes. This prevented use cases like storing lexicographically sortable binary data (e.g., arbitrary-precision decimals) that need efficient range filtering.

How

  1. Enable range queries for Bytes - Changed is_type_valid_for_fastfield_range_query() to return true for Type::Bytes
  2. Add BytesColumn handling in scorer - Added a branch in FastFieldRangeWeight::scorer() to handle bytes fields using dictionary ordinal lookup (mirrors the existing StrColumn logic)
  3. Add SortByBytes - New sort key computer for TopN queries on bytes columns

Tests

  • test_bytes_field_ff_range_query - Tests inclusive/exclusive bounds and unbounded ranges
  • test_sort_by_bytes_asc / test_sort_by_bytes_desc - Tests lexicographic ordering in both directions

## What

Enable range queries and TopN sorting on `Bytes` fast fields, bringing them to parity with `Str` fields.

## Why

`BytesColumn` uses the same dictionary encoding as `StrColumn` internally, but range queries and TopN sorting were explicitly disabled for `Bytes`. This prevented use cases like storing lexicographically sortable binary data (e.g., arbitrary-precision decimals) that need efficient range filtering.

## How

1. **Enable range queries for Bytes** - Changed `is_type_valid_for_fastfield_range_query()` to return `true` for `Type::Bytes`
2. **Add BytesColumn handling in scorer** - Added a branch in `FastFieldRangeWeight::scorer()` to handle bytes fields using dictionary ordinal lookup (mirrors the existing `StrColumn` logic)
3. **Add SortByBytes** - New sort key computer for TopN queries on bytes columns

## Tests

- `test_bytes_field_ff_range_query` - Tests inclusive/exclusive bounds and unbounded ranges
- `test_sort_by_bytes_asc` / `test_sort_by_bytes_desc` - Tests lexicographic ordering in both directions
Copy link
Collaborator

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pub struct SortByBytes {
column_name: String,
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, because UTF8 sorts correctly bytewise, the SortByString SortKeyComputer could be implemented 98% in terms of the SortByBytes implementation, with a final UTF8 decode wrapped around the convert_segment_sort_key call.

Almost all string decoding is already avoided though, so it would just allow for some code reuse.

@fulmicoton fulmicoton merged commit 8018016 into main Feb 11, 2026
8 checks passed
@fulmicoton fulmicoton deleted the paradedb/byte-fast-field branch February 11, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants