Add wiki-llm skill (Community)#100
Open
bitphill wants to merge 3 commits into
Open
Conversation
A skill that maintains a persistent, incrementally-updated wiki for any project folder. Implements the LLM-Wiki pattern: rather than RAG over raw files, the agent builds and maintains a structured markdown knowledge base that compounds over time. Highlights: - Multi-project registry (`wiki init <path>` per project, all tracked centrally) with single-project mode supported. - Adaptive init questions per project type (js-ts, python, rust, go, data-folder, notes-folder, generic-website, generic). - Optional Claude Code Stop hook that auto-detects git diffs in any registered project, archives deletions to a recoverable trash, and queues pending ingests. Update content regeneration is deferred to an explicit `wiki update` so token cost stays predictable. - Wiki-native trash: deleted source files are archived under `<wiki>/.wiki-llm/trash/<timestamp>/` with a manifest; `wiki list-deleted` and `wiki recover` work independently of git. - qmd integration: hybrid BM25+vector search over the wiki via `qmd query` / `qmd index`; `wiki install-qmd` provisions it. - Single Bun CLI: init, status, ingest, update, query, lint, list-deleted, recover, hook, install-hook, install-qmd.
- New `wiki doctor [--fix]` command: checks bun, git, qmd, ripgrep, rustup, cargo. With --fix, best-effort installs missing pieces (qmd via bun/npm → cargo → bootstrap rustup then cargo; git and ripgrep via apt-get when running as root). - Auto-preflight: on the first ever invocation in a fresh Zo Computer instance (no `~/.wiki-llm/registry.json` yet), the CLI silently runs `doctor --fix` to install anything missing before the user's command runs. Skipped for `doctor` itself and `hook` (to keep Stop-hook latency minimal). - `install-qmd` upgraded to multi-strategy install (bun → npm → cargo → bootstrap rustup then cargo). - `which()` probes well-known install locations (`~/.cargo/bin`, `~/.bun/bin`, `/usr/local/bin`) so cargo/qmd are detected even when not on PATH in a non-login shell. - SKILL.md documents both the new command and the first-run preflight behavior.
Author
|
Update: added
Smoke-tested on a fresh shell: |
Security:
- Replace shelled git interpolation with spawnSync arg form; validate
refs against ^[A-Za-z0-9._/-]{1,255}$ before any use
- Reject path-traversal (.., absolute, null bytes) on ingest/recover/
trash dest; enforce isUnder(root, dest) on all writes
- Cap `git show`/`git diff` output at 50 MB (maxBuffer)
- Cap ingest source size at 5 MB
Robustness:
- safeReadJSON() everywhere — corrupt registry/state/manifest no
longer crashes the CLI
- Atomic writes (.tmp + rename) for registry, state, config, settings
- Single-flight lockfile around `wiki hook` (stale-lock reclaim @60s)
- Queue dedup on append (same file+action within last 100 entries)
Performance / memory:
- Lint: O(N) reference scan over a Set, replacing O(N^2) substring
pair-check across all bodies
- which() result cached per-process
- Trash retention: prune trash dirs older than `trash_retention_days`
(default 30; configurable per-project)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds wiki-llm — a Community skill that maintains a persistent, incrementally-updated wiki for any project folder. Implements the LLM-Wiki pattern by Andrej Karpathy: the agent builds and maintains a structured, interlinked markdown knowledge base that compounds over time, instead of re-deriving knowledge via RAG on every query.
Why it matters — token / cost reduction
Without a wiki, every prompt that touches a project pays the same compounding tax:
ls/grep/read_fileover the project to rediscover what's there.For a non-trivial repo (a few thousand source files), each session can easily burn 50k–200k input tokens before the agent does any real work. That cost is paid every prompt, multiplied across every model tier the request fans out to. The wiki replaces that with a tiny, structured, pre-synthesized layer:
pages/sources/<slug>.md, and cross-linked to entity / topic pages — the agent reads ~10 KB of pre-condensed wiki instead of 2 MB of raw code.index.mdcatalog acts as a O(1) routing table — the agent jumps straight to the relevant page rather than blind-grep.log.mdgives versioned history of ops + decisions, so prior fixes (e.g. an env-var bug from last month) don't need re-derivation.Net: bigger projects benefit more (sub-linear context growth vs. linear re-scan), and downstream model fan-out (T1 → T5 cascade in interview-portal, batched eval pipelines, etc.) multiplies the savings further because every tier reads the smaller wiki instead of the raw repo.
Usage
All commands operate on a registered project — pass
--project <path>or run from inside one. Every project lives in the central registry at~/.wiki-llm/registry.json; per-wiki internals live at<wiki>/.wiki-llm/.wiki init [path]js-ts,python,rust,go,data-folder,notes-folder,generic), emit questions JSON. Re-run with--json <answers>to scaffold.wiki init /home/workspace/my-appthenwiki init --json /tmp/answers.jsonwiki statuswiki statuswiki register [path]wiki register /home/workspace/my-appwiki unregister [path]wiki unregister /home/workspace/my-appwiki ingest <file>wiki ingest src/server.ts --project /home/workspace/my-appwiki update [--commit]--commitadvances state.wiki update --commitwiki query "..."qmdif installed, else ripgrep fallback.wiki query "ZO_API_KEY auto-grader fallback"wiki lintwiki lintwiki list-deletedwiki list-deletedwiki recover <relpath>wiki recover src/utils.tswiki hookwiki hook(invoked by Claude Code)wiki install-hookwiki hookin~/.claude/settings.jsonStop hook array. Idempotent.wiki install-hookwiki uninstall-hookwiki uninstall-hookwiki install-qmdwiki install-qmdwiki doctor [--fix]bun,git. Optional:qmd,ripgrep,rustup,cargo.wiki doctor --fixWhat's in it (technical)
Architecture
<wiki>/AGENTS.md— co-evolved conventions).~/.wiki-llm/registry.json; per-wiki state at<wiki>/.wiki-llm/{config.json, state.json, queue.jsonl, trash/}.index.md(content catalog),log.md(chronological event log),pages/sources/,pages/entities/,pages/topics/— populated by the agent based onAGENTS.mdconventions.Adaptive init
package.json→ js-ts;pyproject.toml/requirements.txt/setup.py→ python;Cargo.toml→ rust;go.mod→ go; ≥2.csv/.parquet/.duckdbfiles → data-folder; ≥2.md/.txt/.pdf→ notes-folder; else generic.code+opsfor code projects,allfor notes/data) so the same skill scaffolds a code wiki and a research wiki differently.Stop-hook auto-update
git rev-parse HEAD+git status --porcelain+ diff vs<wiki>/.wiki-llm/state.json:last_head.**,*,{a,b}glob translator).<wiki>/.wiki-llm/trash/<ISO-stamp>/<relpath>with a manifest line; queue add / modify / delete entries toqueue.jsonl.wiki updateso token cost stays predictable.Wiki-native trash + recovery
git reset --hard's away history, the trash still holds the deleted source.HEAD→state.last_head→ last commit touching the path (git log -n1 --pretty=%H -- <path>).wiki list-deleted/wiki recover <relpath>work from the manifest, not git.trash_retention_days(default 30) per project; pruning runs duringwiki update.qmd integration
qmd search).ripgrep -lwith a heading-rank heuristic if qmd is missing.wiki install-qmdtriesbun add -g qmd→npm i -g qmd→cargo install qmd→ bootstrap rustup then cargo.First-run dependency preflight
~/.wiki-llm/registry.jsondoesn't exist (fresh Zo instance), the CLI auto-runsdoctor --fixbefore any subcommand (exceptdoctorandhookthemselves to keep latency / recursion in check).Hardening (security / perf / memory)
This PR was reviewed end-to-end for security, vulnerability exposure, memory leaks, and performance — relevant fixes folded in:
gitcalls go throughspawnSyncwith explicit arg arrays — no shell interpolation of user-controlled data. Refs (HEAD,state.last_head, current head) are validated against^[A-Za-z0-9._/-]{1,255}$before any use, so a tamperedstate.jsonor a malicious branch name cannot break out of the argv boundary...segments, or has null bytes. A defence-in-depthisUnder(root, dest)check then re-resolves the final write path and refuses anything that escapes the trash root, project root, or wiki root — so a corrupt manifest cannot getwiki recoverto clobber/etc/passwd.git show/git diffare capped at 50 MB viamaxBuffer; ingest refuses sources over 5 MB with a clear error rather than OOM-ing the process.safeReadJSON()wraps every parse of registry / state / config / manifest / settings; a corrupt file falls back to defaults instead of throwing.~/.claude/settings.jsonare written via.tmp+rename, so a crash mid-write can't leave them half-written.wiki hookacquires a~/.wiki-llm/hook.lock(O_EXCL with 60-s stale-lock reclaim), so two parallel Stop fires can't race-corrupt the queue.file + actionwithin the last 100 entries is skipped — prevents queue growth on chatty turns.wiki update.Set<string>— lint over hundreds of pages is now linear, not quadratic.which()cache: per-process memoization; binary detection no longer repeatswhichshell-outs across a single command's lifetime.Files
Community/wiki-llm/SKILL.md— instructions for agentsCommunity/wiki-llm/DISPLAY.json— catalog metadata (icon:book-open)Community/wiki-llm/scripts/wiki— the Bun CLICommunity/wiki-llm/assets/wiki-template/—AGENTS.md+index.md+log.mdskeleton seeded into every new wikiCommunity/wiki-llm/references/CONCEPT.md— the original LLM-Wiki concept docValidation
bun validatepasses clean forCommunity/wiki-llm. Smoke tested end-to-end on aninterview-portalproject: init → seed → register qmd collection → ingest → update → lint → list-deleted → recover → Stop hook trigger → path-traversal-rejection → corrupt-JSON survival → trash-retention prune.