Skip to content

feat(bcf): Reduce allocations by using CStr8 #478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

killercup
Copy link

@killercup killercup commented Jul 11, 2025

As of today, every time a field is written, a CString is allocated as the methods only take &[u8] slices, which have no guaranteed \0 termination.

Solution

This PR puts the work on the caller -- who probably either has a static list of tags or can pre-allocate and keep references to them around. It changes many of the method signatures to take a &CStr8, which is a type that ensures the data is both UTF-8 and null-terminated.

I don't have isolated measurements, but alongside some other allocation fixes, this improves the runtime performance of the tool I'm working with by 1.41x when writing large-ish (~500MB) BCF files.

Alternatives considered

  1. Take a CStr

    This is a type from the std lib, so there is no external dependency being introduced. Since the tags are also used in error messages, this would introduce UTF-8 checks, albeit in the cold path.

  2. Introduce a custom type.

    The cstr8 crate seems to do what we need here but is not very popular and also at version 0.1.x. Making a new custom type would put us in charge of it, with the ability to optimize further.

  3. Do nothing. This costs a bunch of performance.

Downsides of this approach and future actions

  • This introduces a new dependency. We might want to make this opaque and re-export most of it.
  • Documentation needs to be updated to teach about this.
  • We might still want to add a conversion trait and Cow-like type for convenience.
  • This PR does not update all places where this could be done, only those that showed up in my profiling.

Please let me know what you think of this. I'm using this branch in the project I'm working on and found no issues, but I also don't use all of rust-htslib. I'll keep this branch up to date.

Previously, every time a field is written a `CString` was allocated as
the methods only took `&[u8]` slices. Now, this work is put on the
caller -- who probably either has a static list of tags or can pre-alloc
and keep references to them around.
When writing strings, use a simple guess to alloc a vec buffer of the
right size.
@killercup killercup marked this pull request as draft July 11, 2025 15:23
@killercup killercup changed the title BCF: Reduce allocations by using CStr8 feat(bcf): Reduce allocations by using CStr8 Jul 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant