feat(ci): token-limits gate (tiktoken, no key) + byte/token partition#39
Merged
Conversation
…tion New reusable _token-limits.yml budgets AI-read Markdown docs via the public, offline tiktoken tokenizer (no ANTHROPIC_API_KEY); per-repo config in .token-limits.yaml. _ci-gate.yml gains a token_limits input + a Token Limits job wired into the Merge Gate. _file-size.yml drops .md from its scan when a .token-limits.yaml is present, so each file is governed by exactly one gate (token budgets for docs, byte limits for everything else). - scripts/check-token-limits.py: fnmatch budgets (most-restrictive match wins), exclude globs, skips non-token-gated + binary files; exit 1 on violation - counter verified locally with tiktoken (o200k_base); actionlint clean Assisted-by: Claude:claude-opus-4-8[1m]
There was a problem hiding this comment.
Code Review
This pull request introduces a new offline token-limit checker script (scripts/check-token-limits.py) that uses tiktoken to enforce token budgets defined in .token-limits.yaml. Feedback was provided to make the YAML parsing logic more robust by explicitly validating the types of the parsed configuration elements, preventing potential CI crashes if the configuration file is malformed.
…formed) Per review on #39 (gemini-code-assist): a malformed config (e.g. limits as a list, or non-int budgets) previously crashed the run with AttributeError/ TypeError. Now type-check parsed limits/exclude/defaults and degrade to 'nothing token-gated' instead of failing CI. Assisted-by: Claude:claude-opus-4-8[1m]
Per review direction: prefer trusted community tooling, keep custom code to the bare minimum. Drops the 105-line script to ~28 lines of logic using the public, offline tiktoken tokenizer directly (no API key, no unvetted third-party tool). Still: first-match glob budgets, exclude, malformed-config guard. Assisted-by: Claude:claude-opus-4-8[1m]
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Finishes the long-started transition from a raw byte file-size gate to a token-based limit for AI-read docs — using a no-auth, offline tokenizer (
tiktoken) instead ofatc(which needsANTHROPIC_API_KEY). Both gates now coexist with mutually-exclusive coverage: token budgets for Markdown docs, byte limits for everything else.Changes
_token-limits.yml(new, reusable): counts tokens with public/offlinetiktoken(pip install tiktoken, no secret), per-repo config in.token-limits.yaml; sparse-checkouts the shared counter (same pattern as the watchdog).scripts/check-token-limits.py(new): fnmatch budgets (most-restrictive match wins),excludeglobs, skips non-token-gated + binary files; exit 1 on violation. A file is token-gated iff it matches alimitspattern._file-size.yml: when a.token-limits.yamlis present, drops.mdfrom its scan so Markdown docs are governed only by the token gate — no file is double-gated. Repos without.token-limits.yamlare unaffected._ci-gate.yml: newtoken_limitsinput +Token Limitsjob wired into theMerge Gateneeds/allowed-skips.Config shape
Test plan
actionlintclean on all three workflows (shellcheck included — no SC2053/SC2254 suppressions; the partition uses an extension drop, not dynamic globs).tiktoken(o200k_base): correct per-pattern budgets,excludehonored, non-.mdskipped, exit 1 on over-budget.First consumer:
dryvist/tofu-unifi(follow-up PR) flipsfile_size-only →file_size + token_limits.