Skip to content

Conversation

@farhan-syah
Copy link
Collaborator

Summary

  • Add decode_batch() and decode_batch_lossy() methods for parallel decoding using Rayon
  • Add Python bindings for batch decode methods
  • Add comprehensive benchmark scripts for cl100k, llama3, and o200k tokenizers
  • Bump version to 0.5.0

Changes

New Features

  • Batch Decoding: New parallel batch decoding methods in Rust core and Python bindings
  • Benchmarks: Comprehensive benchmark scripts for all tokenizer vocabularies

Fixes

  • Fixed .version file not being updated in previous release
  • Fixed __version__ in Python package

Add batch decoding methods (decode_batch and decode_batch_lossy) to both Rust core tokenizer and Python bindings for parallel processing of multiple token lists.
Add benchmark suites for cl100k_base, Llama 3, and o200k_base tokenizers.
Each script compares splintr performance against reference implementations
(tiktoken, HuggingFace) across single/batch encoding and decoding operations.

- benchmark_cl100k.py: GPT-4/GPT-3.5-turbo tokenizer benchmarks
- benchmark_llama3.py: Llama 3 family tokenizer benchmarks
- benchmark_o200k.py: GPT-4o tokenizer benchmarks

Benchmarks measure throughput (MB/s, tokens/s) and latency across various
text types (short, medium, long, code, multilingual) with visualization
support via matplotlib charts.
Add benchmark suites for cl100k_base, Llama 3, and o200k_base tokenizers.
Each script compares splintr performance against reference implementations
(tiktoken, HuggingFace) across single/batch encoding and decoding operations.

- benchmark_cl100k.py: GPT-4/GPT-3.5-turbo tokenizer benchmarks
- benchmark_llama3.py: Llama 3 family tokenizer benchmarks
- benchmark_o200k.py: GPT-4o tokenizer benchmarks

Benchmarks measure throughput (MB/s, tokens/s) and latency across various
text types (short, medium, long, code, multilingual) with visualization
support via matplotlib charts.
@farhan-syah farhan-syah merged commit 3ae79f9 into main Nov 26, 2025
5 checks passed
@farhan-syah farhan-syah deleted the feat/v0.5.0 branch November 26, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants