Skip to content

fix: align tokenization defaults and explicit tiktoken fallback#7

Merged
nikazzio merged 1 commit intomainfrom
patch/tokenization-default-fallback-docs
Feb 12, 2026
Merged

fix: align tokenization defaults and explicit tiktoken fallback#7
nikazzio merged 1 commit intomainfrom
patch/tokenization-default-fallback-docs

Conversation

@nikazzio
Copy link
Owner

Summary

  • align runtime tokenization default with config default (tiktoken)
  • make tiktoken fallback behavior explicit and configurable
  • document fallback semantics in README and docs
  • add regression tests for config validation and fallback behavior

Changes

  • src/maxwell_demon/analyzer.py
    • default tokenization now uses method = tiktoken
    • new flag: fallback_to_legacy_if_tiktoken_missing
    • explicit behavior when tiktoken is missing:
      • fallback + warning when flag is true
      • raise ModuleNotFoundError when flag is false
    • warning emitted once per process to reduce noise
  • src/maxwell_demon/config.py
    • add tokenization.fallback_to_legacy_if_tiktoken_missing default
    • validate it as boolean
  • config.example.toml
    • add fallback_to_legacy_if_tiktoken_missing = true
  • docs
    • README.md, DOC/docs.md, DOC/guide.md updated with explicit fallback semantics
  • tests
    • tests/test_core_coverage.py updated with:
      • validation test for new config key type
      • tests for fallback warning/behavior
      • test for explicit no-fallback error mode

Validation

  • .venv/bin/ruff check .
  • PYTHONPATH=src python3 -m pytest -q
    • 48 passed

@nikazzio nikazzio merged commit 90aba75 into main Feb 12, 2026
1 check passed
@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 44.85%. Comparing base (3f4ea45) to head (18983cd).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/maxwell_demon/analyzer.py 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main       #7      +/-   ##
==========================================
+ Coverage   43.95%   44.85%   +0.89%     
==========================================
  Files          13       13              
  Lines         728      738      +10     
  Branches      127      130       +3     
==========================================
+ Hits          320      331      +11     
+ Misses        346      344       -2     
- Partials       62       63       +1     
Flag Coverage Δ
unittests 44.85% <90.90%> (+0.89%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nikazzio nikazzio deleted the patch/tokenization-default-fallback-docs branch February 12, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant