This project is a Byte Pair Encoding (BPE) tokenizer written in Zig that tokenizes text and calculates the cost across various LLM providers.
It reads input from src/prompt.txt, performs BPE tokenization, and displays a comprehensive pricing table for popular language models.
- Pure Zig 0.15 implementation (no dependencies outside
std) - BPE Tokenization:
- Iteratively finds and merges most frequent adjacent byte pairs
- Stops when no pair occurs more than once
- ANSI-colored output for token visualization
- LLM Pricing Calculator:
- Calculates prompt costs
- Displays cost per prompt and price per million tokens
- Reads input from
src/prompt.txtfile
Create a file src/prompt.txt with your text:
Hello world! This is a test prompt.
Run:
zig build runOutput:
- Add your text to
src/prompt.txt - Build and run:
zig build runTo add new LLM models, edit the models array in src/main.zig:
const models = [_]Model{
.{ .name = "Your Model Name", .price_per_million = 0.50 },
// ... other models
};- Allow reading text from a file instead of hardcoding
- Command-line arguments for file input