brain turns a folder of markdown and text files into a queryable knowledge base. Ask it a question and it retrieves the most relevant chunks from your notes, then streams a grounded answer with citations back to your terminal. When your notes don't cover the question, it tells you — no hallucinated filler. It also runs as a Model Context Protocol server so Claude Desktop and Claude Code can query your notes directly: see MCP server.
It's a TUI-first app, not a thin CLI wrapper around an API call. You get an interactive multi-select collection picker, a readline REPL with tab-completion and unique-prefix slash commands, a streaming markdown renderer that colors headings/code/lists live as tokens arrive, mid-response Ctrl+C cancellation, and model/mode pickers you can invoke mid-session. Built on Cobra, charmbracelet/huh (pickers), charmbracelet/lipgloss (styling), and chzyer/readline (REPL). See Chat mode for the full slash command surface.
One-shot Q&A — retrieval spinner, streaming markdown answer, cited sources, closing logo:
Interactive REPL — slash commands, /mode switching, grounded answer, clean exit:
Thinking modes — same topic asked four different ways. The structure of the response changes with the mode: recall → direct answer, analysis → findings/connections/gaps, decision → frameworks/recommendation, synthesis → building blocks/action plan.
/challenge — re-score an answer against a different set of sources. brain rebuilds the system prompt around the new chunks and streams a re-grounded response that contrasts the two. Great for stress-testing a conclusion before you commit to it.
TUI pickers — the huh multi-select collection picker that fires on brain chat startup, plus the /model picker for switching Claude models mid-session:
brain search — raw retrieval, no LLM. Lands in a few hundred milliseconds with scored chunks and inline previews. This is what powers the "no context → no LLM → no hallucination" principle:
brain history — every answer is archived on disk with model/collections/elapsed metadata. The default list is scriptable; search and view drill in:
brain history browse — interactive TUI picker: / fuzzy-filters your past questions, f kicks a full-text search across answer bodies, enter opens the viewer, esc pops back to the list, d deletes, c copies the selected answer to your clipboard, x exports it to ~/Downloads/ (override with $BRAIN_EXPORT_DIR), p toggles a ★ pin (also reachable from chat as /pin). Built on bubbles/list + viewport so it pages, scrolls, and stays out of your scrollback:
- Demo
- Core principle
- Features
- Requirements
- Install
- Quick start
- Commands
- Chat mode
- Thinking modes
- MCP server
- Eval harness
- Telemetry
- Configuration
- Architecture
- Development
- Contributing
- License
No context → no LLM call → no hallucination.
Every answer brain gives is grounded in chunks retrieved from your own notes. If the retrieval step returns nothing relevant, the LLM call is skipped entirely and you get a clear "nothing found" message instead of a confident-sounding fabrication.
brain ask "<question>"— one-shot Q&A, cited sources, streaming answerbrain chat— interactive multi-turn REPL with slash commands, tab completion, and mid-response cancellationbrain search "<query>"— raw retrieval, no LLM, for verifying your index;-ncaps the result count,--expand/--rerank/--rerank-n/--hyde/--explainmirror theaskenhancements so you can tune flags against a concrete output before spending LLM tokens--effort fast|balanced|thorough(v0.6.0) — opt-in retrieval preset onask/search/chat. Default: none —brain ask "x"without--effortbehaves as v0.5.x did (plain hybrid retrieval, no enhancement). Pass--effort balancedfor the recommended retrieval shape (=--expand --rerank);fastis an explicit name for the v0.5.x default (hybrid only);thorough=--expand --rerank --hyde --deepwith wider RerankTopN. When a preset is set, explicit flags on the same command win per-field. In chat,/effort fast|balanced|thoroughswitches preset mid-session (clears prior subprocess toggles)brain history— every Q&A is saved as a timestamped markdown file with model/collections/elapsed metadata;brain history browseopens an interactive TUI picker with/filter on questions,ffor full-text search across answers,enterto view,dto delete,cto copy the answer to clipboard (pbcopy/wl-copy/xclip),xto export to~/Downloads/,pto pin (★) for later/challenge— re-score an answer against a different set of sources to check it- Adaptive prompt system — questions are classified into
recall,analysis,decision, orsynthesismodes, each with a different response structure - Full document enrichment — top results are re-fetched as complete documents so the LLM sees the full source, not just the highest-scoring chunk; long transcripts with detail buried past the intro are handled correctly
brain add --context "description"— tells the search engine what a collection is about, dramatically improving retrieval quality for domain-specific content--index <name>(global) — keep multiple isolated knowledge bases under one binary (e.g.brain --index workvsbrain --index personal). Backed by recall's named-index convention at~/.recall/indexes/<name>.db--expand— query expansion via recall's local expansion LLM: generates lex/vec query variants (the retriever runs each and merges) and hypothetical passages that can be combined with--hyde--hyde— Hypothetical Document Embedding: the LLM produces a short "ideal answer" passage, recall embeds it as if it were a real document, and that vector joins the candidate search as an extra probe--rerank— cross-encoder rerank the top fused candidates via recall's bundled bge-reranker-v2-m3 (continuous 0.0-1.0 relevance score), then blend with RRF rank using the position-aware 75/25 → 60/40 → 40/60 bands. Window size defaults to 30; tune with--rerank-n--rerank-n <int>— how many candidates the cross-encoder sees when--rerankis on. Default 30; widen (e.g.60) when generic queries return too few sources, narrow (e.g.10) when you want a fast top-of-funnel signal--first-query-bonus <f>— only consulted with--expand/--hyde. Tunes the first-list weight in recall's RRF cross-variant merge (MergeRankLists).0(default) keeps the qmd-style 1.0 (user's literal query counts twice);0.5dampens to 1.5×; negative disables. Use for ranking tie-breaks between expansion variants — not for tail-width recovery: an internal A/B (see CHANGELOG v0.4.13) showed the bonus affects same-side ordering but not the final fused score the floor cuts against. Reach for--top-rank-bonusor--floorwhen the--expandtail trim is biting--top-rank-bonus <f>— overrides recall's rank-1 FuseRRF bonus (default 0.05).<0(default sentinel) = use recall's default;0disables the bonus entirely; positive values pass through. This is the v0.4.11 floor-inflation counter-knob the A/B identified:--top-rank-bonus 0shrinks the inflated top score so the adaptive floor cuts less, fully recovering v0.4.10 tail widths on conceptual queries. Equivalent end-state to--floor 0via a different mechanism (shrink the peak vs. disable the cut). Pair with--floorfor two independent dials--explain— surface a per-chunk score trace (bm25@1 (4.21) vec@3 (0.81) bonus=+0.05 rrf=0.082 hyde@2 rerank=0.83on the enhanced path, the same shape minushyde@K/rerank=…on the plain hybrid path) so you can see why a document landed where it did. Works onask,search, andchat(via/explaintoggle)--deep//deep— post-retrieval LLM chunk filter (20 → 8-10). Independent of--expand/--rerank/--hyde; sits after retrieval and reduces the working set handed to the answer-generation prompt. Combine freely with the recall enhancements.- Cross-source tension detection — the system prompt forces the model to identify disagreements between sources before synthesizing, pushing beyond shallow summary
- Adaptive scoring — instead of a hard relevance cutoff that silently drops chunks, brain uses 40% of the top chunk's score as a dynamic floor; on difficult queries where all scores are low, weak-but-best results survive instead of returning "nothing found"
- Citation verification — after every answer,
[filename.md]citations are checked against the retrieved sources; fabricated filenames get a⚠warning - Prompt caching — on the Anthropic backend, system directives and conversation history are structured for prompt caching, reducing latency and cost on multi-turn chat sessions
brain doctor— checks pipeline health (recall index reachable, embedder loadable, LLM backend configured), prints index sanity (docs / embeddings / chunks-per-doc), surfaces the most recent collection-update timestamp, and reports disk usage for the recall DB / models / llama-server / brain history directories. Actionable fix commands on every failed check- Token telemetry — every LLM call's input / output token counts are surfaced after the answer (one line on
brain ask, in the chat footer per turn + cumulative session total in/help) and persisted to history.brain historylist showsN→M toknext to each entry. Anthropic SDK + OpenAI-compat backends populate this; Claude CLI fallback reports zero - Collection picker — multi-select UI to scope a question to specific note folders
- Model switching — swap between
sonnet(default),opus, andhaikumid-session - Ctrl+C everywhere — cancel retrieval or streaming at any time without leaving your terminal in a broken state
- Pluggable backend — native Anthropic API, any OpenAI-compatible endpoint (OpenAI, Ollama, OpenRouter, LM Studio, LiteLLM, Groq, Together…), or the local
claudeCLI as a fallback brain mcp— Model Context Protocol server mode so Claude Desktop and Claude Code can query your notes directly via tools (brain_ask/brain_search/brain_collections/brain_files/brain_document/brain_document_search/brain_status) and resources (brain://collection/<name>,brain://document/{collection}/{+path}). See MCP serverbrain eval— retrieval quality harness: snapshot a query set, diff snapshots over time, catch retrieval regressions before they ship. See Eval harness
- macOS (arm64) or Linux (amd64). Windows isn't a release target — brain depends on recall's CGo SQLite stack and on llama.cpp's prebuilt llama-server, neither of which we currently build Windows artefacts for.
- Linux runtime dep: minimal container bases (ubuntu without
build-essential, Alpine, …) needlibgomp1/libgompinstalled — recall's llama-server subprocess dlopens OpenMP-linked CPU backend plugins. A normal workstation already has it viagcc/g++. - At least one LLM backend. brain picks the first one it finds, in this order:
ANTHROPIC_API_KEY— native Claude API, the fastest and cheapest path (recommended).OPENAI_API_KEY— any OpenAI-compatible endpoint. Works out of the box with OpenAI, and viaOPENAI_BASE_URLalso with Ollama, OpenRouter, LM Studio, LiteLLM, Groq, Together, Fireworks, etc. See Configuration for examples.- The Claude Code CLI on your PATH — useful if you have a Claude subscription but no API key. Override the binary name with
BRAIN_CLAUDE_BINto point at a fork (e.g.opencode).
- Go 1.26+ (per
go.mod) — only needed if you're building from source.
brew install ugurcan-aytar/brain/brainNo sudo needed — Homebrew manages its own prefix. Works on macOS (arm64) and on Linux (amd64) via Linuxbrew. Every release auto-publishes a formula to the homebrew-brain tap.
curl -sSfL https://raw.githubusercontent.com/ugurcan-aytar/brain/main/install.sh | shThe script downloads the right prebuilt binary for your OS/arch, verifies its SHA-256 against checksums.txt, drops it into /usr/local/bin (or ~/.local/bin as a fallback), and runs brain doctor at the end to confirm your LLM backend is wired up. Retrieval is handled by the embedded recall library — no Node.js, no npm, no second binary to chase. Local embedding and generation use llama.cpp's official llama-server prebuilt as a subprocess (auto-downloaded by recall on first use, lives in ~/.recall/bin/llamacpp/).
Environment overrides: BRAIN_VERSION=v1.2.3 to pin a release, BRAIN_PREFIX=$HOME/.local to change the install prefix.
git clone https://github.com/ugurcan-aytar/brain.git
cd brain
go build -o brain ./cmd/brain
sudo mv brain /usr/local/bin/go install github.com/ugurcan-aytar/brain/cmd/brain@latestAfter any install path, run brain doctor to check that the index is reachable and a Claude (or other) backend is wired up.
Kick the tires against the included sample notes:
# 1. Register the example notes folder (auto-runs `brain index` afterward)
brain add ./examples --name examples
# 2. Ask a question
brain ask "What did the team decide about authentication?"
# 3. Or start an interactive conversation
brain chatPoint brain add at your own folder when you're ready — ~/Documents/my-notes,
~/Obsidian/vault, whatever. The examples/ directory ships
three small markdown files (meeting notes, a technical doc, a journal
entry) chosen to show off retrieval, cross-source connections, and
thinking-mode responses on a realistic tiny corpus.
| Command | Description |
|---|---|
brain ask "<question>" |
One-shot Q&A with cited sources |
brain chat |
Interactive multi-turn conversation |
brain search "<query>" |
Raw retrieval results (no LLM) |
Global flag (works on every subcommand):
--index <name>— use a named isolated recall index at~/.recall/indexes/<name>.dbinstead of the default DB. Lets you keep e.g.brain --index work …separated frombrain --index personal …under one binary. Ignored (with a warning) when$RECALL_DB_PATHis set.
Flags on ask:
-c, --collection <name>— scope to one or more collections (skips the picker). Repeatable (-c Lenny -c Reforge) or comma-separated (-c Lenny,Reforge)-m, --model <model>—sonnet(default),opus,haiku, or a full Anthropic model ID-M, --mode <mode>— override the auto-detected thinking mode:auto,recall,analysis,decision,synthesis-n, --top <int>— candidate pool size (default: configTopK)--expand— query expansion (lex/vec variants + HyDE passages); costs a one-shot expansion-LLM call--rerank— cross-encoder rerank the top candidates via recall's bge-reranker-v2-m3; continuous 0-1 scores blended with RRF rank--rerank-n <int>— how many candidates the cross-encoder sees when--rerankis on (default 30)--floor <pct>— override the adaptive 40%-of-top fusion floor on a per-query basis.0shows everything (no cut);0.25loosens (more sources);0.6tightens. Negative or unset keeps recall's default--first-query-bonus <f>— first-list weight in recall's--expand/--hydeRRF merge (ranking tie-breaker between expansion variants).0(default) = qmd 1.0;0.5dampens; negative disables. Not a tail-width knob — see--top-rank-bonus/--floorfor that--top-rank-bonus <f>— overrides recall's rank-1 FuseRRF bonus (default 0.05).<0= use default;0disables → no top-score inflation → adaptive floor cuts less → v0.4.10 tail widths recovered. Pair with--floorfor two independent floor-inflation dials--hyde— Hypothetical Document Embedding: LLM-generated answer passages embedded as extra vector probes--explain— print a per-chunk score trace (BM25 + vector ranks, RRF score, HyDE rank, reranker score). Renders under each source in the answer block--deep— post-retrieval LLM chunk filter (20 → 8-10). Independent of the three flags above; combinable.--copy— after the answer streams, pipe it to the OS clipboard (pbcopy / wl-copy / xclip / xsel auto-detected)--export— write the answer to~/Downloads/<auto-name>(override the directory with$BRAIN_EXPORT_DIR). Boolean; no value.--export-to <PATH>— write the answer to<PATH>exactly. If<PATH>is an existing directory, drops the auto-named file inside; otherwise creates the file (and parent dirs on demand). Wins over--exportwhen both are set. Both forms run regardless of stdout being a TTY, so they work in pipelines
Flags on search: same as ask minus the LLM-specific ones (-m, -M, --deep). --index works here too (it's global).
Flags on chat:
-c, --collection <name>— scope the whole session to one or more collections (skips the startup picker). Repeatable or comma-separated, same asask-m, --model <model>— same model aliases asask; can also be swapped mid-session with/model--rerank,--expand,--hyde,--explain,--rerank-n <int>,--floor <pct>,--first-query-bonus <f>,--top-rank-bonus <f>— start the session with the matching retrieval enhancement on. Every one of them also has an in-session toggle:/rerank,/expand,/hyde,/explain,/rerank-n <int>,/floor <pct>,/first-query-bonus <f>,/top-rank-bonus <f>
| Command | Description |
|---|---|
brain add <path> |
Register a folder as a collection (runs brain index after) |
brain remove <name> |
Remove a collection and clean up its embeddings |
brain collections |
List registered collections |
brain files [-c name] |
List indexed files, optionally filtered by collection |
Flags on add:
--name <name>— override the default collection name (folder basename)--mask <glob>— override the default file glob (**/*.{txt,md})--context <description>— describe what this collection contains (e.g."Podcast transcripts about product management"); improves search quality significantly
Every answer is archived as a timestamped markdown file (default ~/.brain/history, overridable via BRAIN_HISTORY_DIR). Each file captures the question, answer, sources, and the mode, model, collections, and elapsed time. Browse from the terminal:
| Command | Description |
|---|---|
brain history |
List the 10 most recent entries, newest first |
brain history browse |
Interactive TUI picker — / to filter by question, f to full-text search bodies, enter to view, d to delete, c to copy the answer to clipboard, x to export to ~/Downloads/, p to pin (★), esc back |
brain history search <query> |
Non-interactive full-text search |
brain history view <id> |
Render an entry (id from the list, 1-based) |
brain history rm <id> |
Delete an entry |
brain history path |
Print the history directory path |
Flags on history:
-n, --limit <N>— number of entries to show in the default list (default10)
| Command | Description |
|---|---|
brain index |
Re-scan files and regenerate embeddings |
brain status |
Show index health and brain config |
brain doctor |
Verify recall index + embedder + LLM backend are wired up |
brain cleanup |
Drop orphan chunks + stale embeddings, run SQLite VACUUM. Idempotent — run when brain_status reports embed_coverage > 1.0 (typically a pre-v0.4.0 index leaking vec0 rows on re-index) |
brain chat is a full REPL with slash commands, rolling conversation history, and Tab-to-complete.
| Slash command | Description |
|---|---|
/help |
Show command list, current model/mode/collections, and the on/off state of every retrieval enhancement |
/mode [name] |
View or change thinking mode (auto, recall, analysis, decision, synthesis) |
/model [name] |
View or switch Claude model; bare /model opens a picker |
/rerank |
Toggle cross-encoder reranking (start state set by --rerank startup flag) |
/rerank-n [N] |
View or set the reranker window size. 0 resets to default (30) |
/floor [pct] |
View or set the fusion floor. 0 shows every candidate; 0.25 loosens; 0.6 tightens. /floor reset drops the override |
/first-query-bonus [f] |
View or set the first-list weight on --expand/--hyde RRF merge (ranking tie-breaker). 0 (default) = qmd 1.0; 0.5 dampens; negative disables. /first-query-bonus reset drops the override |
/top-rank-bonus [f] |
View or set the rank-1 FuseRRF bonus (default 0.05). 0 disables; positive value passes through. /top-rank-bonus reset drops the override. Real counter-knob to v0.4.11 floor inflation — pairs with /floor |
/copy |
Copy the last answer to the clipboard (pbcopy/wl-copy/xclip/xsel auto-detected) |
/export |
Write the last answer to ~/Downloads/<filename> (override with $BRAIN_EXPORT_DIR). Filename mirrors the saved-history entry so brain history browse and the export agree on names |
/pin |
Pin / unpin the last answer. Pinned entries get a ★ in brain history browse; press p there to toggle from the list, too |
/diff |
Compare the last two answers in the current session: side-by-side metadata table, source set diff (− prev-only / + curr-only / shared count), and both bodies. Useful after a /challenge or after toggling --rerank / --floor between turns |
/expand |
Toggle query expansion (lex/vec variants) |
/hyde |
Toggle HyDE — hypothetical-doc embedding as extra vector probes |
/explain |
Toggle per-chunk score traces under the source block |
/deep |
Toggle two-pass deep retrieval (LLM filters chunks 20 → 8-10) |
/collections |
Re-run the collection picker |
/sources |
Show the sources from the last answer |
/challenge |
Re-score the last Q&A against a different set of collections |
/clear |
Reset conversation history |
/quit |
Exit chat (also: Ctrl+C twice) |
Slash commands support unique-prefix matching: typing /col resolves to /collections. Disambiguation note: /rerank vs /rerank-n, /expand vs /explain, and /floor vs /first-query-bonus all need their full word (or /expa, /expl, /rerank-, /fl, /fi) — /r, /exp, and /f are ambiguous and the resolver prints the candidates. /t resolves uniquely to /top-rank-bonus. Tab auto-completes partial commands.
Press Ctrl+C once during streaming to cancel the in-flight request. Press it twice within two seconds on an empty prompt to exit.
The chat input is a multi-line textarea that auto-grows as content wraps. To insert a newline without submitting:
Shift+Enter— works on iTerm2, alacritty, kitty, foot, wezterm, Ghostty, and any other terminal that supports xterm's modifyOtherKeys mode 2 or the Kitty progressive keyboard protocol. brain enables both on chat startup (and restores defaults on exit). The modes also re-encode Ctrl+C, Ctrl+D, Alt+letter and friends; brain ships its own translator so every standard shortcut keeps working.\+Enter— cross-terminal fallback, works everywhere (the trailing backslash is consumed).Ctrl+J— universal alternative. Ctrl+J sends LF (\n), distinct from Enter's CR (\r) on every terminal.Alt+Enter— works when your terminal sends Option asEsc+(iTerm2 default; for macOS Terminal.app, enable Preferences → Profiles → Keyboard → "Use Option as Meta key").
macOS Terminal.app doesn't implement modifyOtherKeys, so Shift+Enter there falls back to plain Enter. Use \+Enter or Ctrl+J on that one terminal, or add a per-profile keybinding (Preferences → Profiles → Keyboard → + → Key Shift-Return, Action Send Text, Value \033\r).
When you paste 5+ lines or 280+ characters, the input collapses the blob into a [Pasted text #N +M lines] placeholder. The placeholder is what shows in the box, scrollback, and /sources history; the original text is preserved and handed to the LLM unchanged. This keeps the terminal clean without hiding context from the model.
Every question gets a system prompt with one of four response structures. Auto-classification picks one based on regex heuristics (English + Turkish), or you can force one with -M / /mode.
| Mode | Trigger examples | Response structure |
|---|---|---|
| recall | "what did my notes say about…", "list…", "what is…" | Direct answer → Related context |
| analysis | "why…", "compare…", "how does X relate to Y", "explain…" | Key findings → Connections → Gaps → Synthesis |
| decision | "should I…", "pros and cons", "recommend…", "worth it" | Relevant frameworks → Arguments → Blind spots → Recommendation |
| synthesis | "plan…", "how can I build/scale/launch…", "roadmap" | Building blocks → Integration → Action plan → Assumptions & gaps |
Analysis is the default when nothing matches — it's the most generally useful.
brain mcp runs brain as a Model Context Protocol server on stdio so Claude Desktop and Claude Code can hit your notes the same way they hit any other MCP source — no separate CLI invocation, no copy-paste. The MCP server is a thin retrieval surface: it never calls brain's own answer LLM. The calling model (Claude) reads the retrieved chunks and writes the answer itself, which means no API key has to live inside the MCP server config.
Add an mcpServers entry to ~/Library/Application Support/Claude/claude_desktop_config.json (Desktop, macOS) or your project's .mcp.json (Claude Code):
{
"mcpServers": {
"brain": {
"command": "brain",
"args": ["mcp"]
}
}
}If you keep multiple isolated indexes, point at one with --index:
{
"mcpServers": {
"brain-work": { "command": "brain", "args": ["--index", "work", "mcp"] }
}
}Restart the host app — Claude will list brain_* tools and the brain://collection/<name> resources next time you open a thread.
Claude Code CLI can also wire it up without editing JSON:
claude mcp add brain brain mcp
claude mcp list # confirm brain is registered
claude # start a session, ask "what brain tools are available?"
claude mcp remove brain # uninstallclaude mcp add accepts the same flag forms, e.g. claude mcp add brain-work brain -- --index work mcp for a named index.
Debugging with MCP Inspector — official tool that surfaces every tool / resource / schema in a browser UI, so you can exercise brain interactively without a host LLM in the loop:
npx @modelcontextprotocol/inspector brain mcp| Tool | Args | Returns |
|---|---|---|
brain_ask |
question, collections?, top?, rerank?, expand?, hyde?, floor_pct? |
ranked source chunks (title, display_path, snippet, citation) — enriched with full document text on the top 3 hits |
brain_search |
query, collections?, top?, expand?, rerank?, floor_pct? |
raw ranked sources from BM25 + vector + RRF (no LLM, no enrichment) |
brain_collections |
— | every registered collection with name, path, doc count, and the human-written context blurb |
brain_files |
collection, glob?, top? (v0.5.10), offset? (v0.5.10) |
relative paths of every indexed file in the collection (doublestar globs supported). top defaults to 50; pair with offset for paging. Response carries total / has_more so the LLM knows whether to ask for more |
brain_document (v0.5.10+) |
collection, path |
full text of a single indexed document — the tool-form equivalent of the brain://document/{collection}/{+path} resource template, for MCP hosts that don't surface resource reads to the LLM |
brain_document_search (v0.5.4+) |
collection, path, query, top?, rerank? (v0.5.10) |
ranked chunks inside one document, scored by vector similarity to the query. Surgical retrieval — use it once you know which document to look in; bypasses corpus-level disambiguation tradeoffs. Pass rerank: true to run the cross-encoder over the returned chunks for lexical-match precision |
brain_status (v0.5.8+) |
— | index-health snapshot: schema version, recall version, collection / document / chunk / embedding counts, embedding-coverage ratio, latest indexing activity, and a per-collection breakdown with last_modified timestamps. embed_coverage > 1.0 is the signal that brain cleanup needs to run |
Each registered collection becomes a resource at brain://collection/<name>. Reading the resource returns the concatenated full text of every document in that collection, with a # <collection>/<path> header before each document so the calling model can cite precisely. Heavy on large collections — use brain_search for targeted queries when the full dump isn't needed.
Concrete per-doc reads via the URI template brain://document/{collection}/{+path} (v0.5.2+). The MCP client lists candidate paths with brain_files (or pulls them from brain_search output) and reads any one of them with a single resources/read call — far lighter than dumping the whole collection. Nested paths like Reforge/Product/Strategy/file.txt travel unencoded thanks to RFC 6570 {+path} expansion; percent-encoded characters (commas, parentheses, non-ASCII) are decoded on the server before lookup. Response shape matches the bulk resource (# <collection>/<path> header + body) so client rendering is uniform.
Autocomplete (v0.5.3+): the server implements MCP's completion/complete extension so the template's collection and path variables get dropdown suggestions in MCP Inspector and any other host that surfaces template-driven autocomplete. Type a partial value (case-insensitive substring) to narrow; the path dropdown is scoped to whichever collection you already picked. No context collection → empty path list, so you fill in the collection first.
User > what did my notes say about activation energy?
Claude → brain_ask(question="activation energy", collections=["chem", "physics"])
← 3 sources: chem/kinetics.md (0.81), physics/thermo.md (0.74), chem/notes.md (0.62)
Claude > Activation energy is the minimum energy reactants need to start a reaction.
Your notes [chem/kinetics.md] frame it as the gap between the resting state
and the transition state on the reaction-coordinate diagram. The thermo notes
[physics/thermo.md] add the Arrhenius form k = A·exp(−E_a / RT)…
stdout is reserved for the MCP protocol stream; structured logs (JSON-lines) go to stderr.
brain eval turns retrieval quality from a vibe into a number. Define a set of queries, snapshot the ranked results, diff snapshots over time. The motivating use case was the v0.4.11 floor-inflation A/B — that whole workflow now lives inside the binary.
brain eval run --name baseline-v0.4.13 # snapshot today's behavior
brain eval baseline baseline-v0.4.13 # pin it for auto-diff
brain eval run --name with-rerank # snapshot the change
# → auto-diffs against the baseline
brain eval diff baseline-v0.4.13 with-rerank --json | jq '.regression_count'
# v0.5.15+: also exercise citation coverage by running the LLM per
# query. Opt-in, never on by default — needs an LLM backend and is
# slower / less reproducible than retrieval-only. CI + scheduled runs
# stay retrieval-only.
brain eval run --with-answers --name baseline-v0.5.15-with-answersFirst run with no query file at ~/.brain/eval/queries.json creates a sample for you to edit (three queries: one literal, one conceptual, one vague) and exits with a hint.
Every snapshot records two retrieval-time metrics per query (since v0.5.15):
| Metric | Computed from | Always populated? |
|---|---|---|
source_diversity |
distinct collections in top-K / min(len(chunks), total_collections) |
yes, whenever the query returned chunks |
citation_coverage |
distinct [filename.ext] citations in the answer that match a retrieved chunk / len(chunks) |
only under --with-answers |
Both surface in brain eval diff (mean + delta lines) and brain telemetry eval (per-run div + cov columns). Pointer-typed in the on-disk JSON so an absent metric reads as missing, not as 0.000 — important because the CI gate would otherwise mis-fire on every retrieval-only snapshot.
Snapshots written by brain ≤ v0.5.14 (schema_version: 1) load unchanged in v0.5.15+; the new metric fields just stay empty. Snapshots from a newer brain are refused with a clear "upgrade" message.
{
"schema_version": 1,
"queries": [
{
"id": "q1",
"tag": "literal",
"query": "product market fit signals",
"collections": ["YCombinator", "PaulGraham"],
"flags": {"top": 10, "expand": true, "rerank": true, "floor": -1}
}
]
}flags.floor mirrors the --floor sentinel: <0 means "use recall's default 0.4", 0 disables the adaptive floor, positive values pass through.
For each query the diff surfaces:
- counts: how many sources baseline / current returned
- shared / added / dropped DocID sets
- rank changes sorted by absolute delta — biggest movers first
- regression candidates — DocIDs that sat in baseline top-3 but vanished from the current ranking (file these as bugs)
- mean Jaccard across the full query set
--json emits the full EvalDiffSummary payload for CI / scripted comparisons.
brain ships a ready-to-use workflow at .github/workflows/eval.yml (v0.5.11+) that exercises retrieval quality against the fixture corpus in testdata/eval-fixture/ on every pull request. Two builds, two snapshots, one diff:
- builds
brainat the PR base SHA and the PR HEAD - bootstraps a temp index against the fixture corpus from each binary
- snapshots both with the same
queries.json - diffs them with
brain eval diff pr-base pr-head --json - fails the job when
mean_jaccard < 0.70ORregression_count > 0
Mock embedder drives the vector half so no llama-server subprocess or API key is needed on the CI runner. The fixture lives in-repo so anyone forking can audit / extend the queries by editing testdata/eval-fixture/queries.json.
For a hand-rolled equivalent (or a different repo that wants this pattern):
brain eval run --name "ci-$(git rev-parse --short HEAD)"
brain eval diff prev-baseline "ci-$(git rev-parse --short HEAD)" --json \
| jq -e '.regression_count == 0'Fail the build when a known-good document falls out of the top-3.
Snapshot from 2026-05-16 (v0.4.13 release) → snapshot 12h later (same brain, same recall, same queries, same collections): mean Jaccard 0.86, zero regression candidates, but five of six conceptual queries that previously hit the v0.4.11 floor inflation now return full width on default settings. Same code, different corpus shape (Founders grew to 1094 docs) — a wider tail dilutes the top RRF score's relative inflation. This is exactly what brain eval exists to surface: behavior shifts driven by data, not code.
brain eval schedule install [--cadence daily|weekly|monthly] [--hour <0-23>] installs a native recurring trigger that runs brain eval run automatically. The platform dispatch picks the right scheduler at compile time:
- macOS → launchd LaunchAgent at
~/Library/LaunchAgents/com.ugurcan-aytar.brain.eval.plist, registered vialaunchctl bootstrap gui/<uid>. - Linux → systemd user unit pair at
~/.config/systemd/user/com.ugurcan-aytar.brain.eval.{service,timer}, armed viasystemctl --user enable --now …timer. No sudo: the user manager runs everything under your UID. The timer carriesPersistent=trueso a missed firing fires on next boot.
brain eval schedule status shows whether the trigger is loaded (and on Linux, when it next fires via systemctl --user list-timers); brain eval schedule uninstall tears it down. Other operating systems return an explicit not-supported error rather than writing a half-broken artefact.
Each scheduled run records to ~/.brain/telemetry/eval-history.jsonl. If the auto-diff against the pinned baseline shows mean Jaccard < 0.70 OR any regression candidate, the run flips an alert flag in the telemetry record and emits an ALERT: line to stderr — visible in the scheduler log at ~/.brain/eval/launch.log. brain telemetry eval renders the most recent N runs with their drift signals.
brain records three local-only JSONL streams under ~/.brain/telemetry/ so you can chart drift over time, audit how MCP tools are being used, and see per-backend latency / token usage for your own chat turns. No network, no third-party SaaS — everything lives on disk in append-only JSON lines you can grep / jq directly. Kill switch: BRAIN_TELEMETRY=off to disable writes (read-side queries still work on existing records).
| File | Source | One record per |
|---|---|---|
eval-history.jsonl |
brain eval run |
each invocation (label, mean Jaccard vs baseline, regression count, alert flag, plus mean source diversity + optional mean citation coverage from v0.5.15) |
mcp-events-YYYY-MM-DD.jsonl |
brain mcp tool calls |
each tool call (tool name, args summary, query body, latency ms, error, result size). Rotates daily |
chat-turns-YYYY-MM-DD.jsonl |
brain ask + brain chat |
each answer turn (mode, question, backend, model, collections, source count, latency ms, input / output tokens, error). Rotates daily. Added v0.5.14 |
Queries:
brain telemetry mcp # last 7d histogram, all tools
brain telemetry mcp --tool brain_ask # one-tool latency breakdown
brain telemetry mcp --last 30 --json # 30d window, machine output
brain telemetry eval # last 20 eval runs + drift signals
brain telemetry eval --last 5 --json # newest 5 as JSON
brain telemetry chat # last 7d per-backend latency + tokens
brain telemetry chat --backend anthropic_api # one backend (anthropic_api | openai | claude_cli)
brain telemetry chat --mode chat --last 30 # 30d, chat-only (ask filtered out)
brain telemetry chat --json # machine-readableDefaults live in internal/config/config.go. The interesting knobs:
| Setting | Default | Purpose |
|---|---|---|
Model |
claude-sonnet-4-6 |
Default Claude model |
MaxTokens |
16384 |
Response length cap |
TopK |
20 |
Chunks to retrieve per question |
MinScore |
0.05 |
Minimum relevance floor (adaptive filter uses 40% of top score) |
MinChunksToCallLLM |
1 |
Grounding gate threshold |
MaxConversationTurns |
10 |
Chat history cap (user + assistant per turn) |
DefaultMask |
**/*.{txt,md} |
Files to index when adding a collection |
Environment variables:
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY |
Use the native Anthropic API (highest priority backend) |
OPENAI_API_KEY |
Use any OpenAI-compatible /v1/chat/completions endpoint |
OPENAI_BASE_URL |
Override the OpenAI endpoint — point this at Ollama, OpenRouter, LM Studio, LiteLLM, etc. Defaults to https://api.openai.com/v1 |
OPENAI_MODEL |
Model name to send to the OpenAI-compatible endpoint. Defaults to gpt-4o. Also honors -m / /model when the value doesn't look like a Claude alias, so brain ask -m llama3.1 "…" works on Ollama |
BRAIN_CLAUDE_BIN |
Name of the Claude CLI binary brain shells out to. Defaults to claude. Set to opencode (or another fork that speaks the same stream-json protocol) to reuse the CLI fallback without rebuilding |
BRAIN_HISTORY_DIR |
Override where Q&A history is written (defaults to ~/.brain/history) |
BRAIN_PINNED_PATH |
Override the pinned-entry sidecar file (defaults to $BRAIN_HISTORY_DIR/.pinned, then ~/.brain/pinned.txt) |
BRAIN_EXPORT_DIR |
Override where brain ask --export and chat's /export drop files (defaults to ~/Downloads/) |
BRAIN_CLASSIFY_LLM |
Set to 0 / false / off to skip the LLM fallback when the regex classifier doesn't match a query. Default behaviour: one short LLM call on the miss path, never on the hit path |
RECALL_RERANK_PARALLEL |
Override how many slots the reranker subprocess exposes (default 4; clamped to [1, 16]). Larger values overlap more cross-encoder forward passes at the cost of ~150–250 MB of KV cache per slot. Set to 1 on small machines |
Ollama (local, free, offline-capable):
export OPENAI_API_KEY=ollama # any non-empty string works
export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_MODEL=llama3.1
brain ask "what did I write about activation energy?"OpenRouter (one key, every model):
export OPENAI_API_KEY=sk-or-…
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
brain ask -m meta-llama/llama-3.1-70b-instruct "…"OpenAI proper:
export OPENAI_API_KEY=sk-…
brain ask "…" # uses gpt-4o by defaultopencode instead of claude:
export BRAIN_CLAUDE_BIN=opencode
brain ask "…"Note: brain's adaptive prompts and thinking-mode directives are tuned for Claude. Non-Claude models will work — the retrieval gate ("no chunks → no LLM call") is model-agnostic — but response quality, especially for
synthesisanddecisionmodes, varies. Multi-query expansion, citation verification, adaptive scoring, and TopK 20 work on all backends. Prompt caching is Anthropic-only (OpenAI auto-caches shared prefixes without explicit markup). Runbrain doctorto see which backend is active.
cmd/brain/ # Cobra entry point + subcommand wiring
internal/
├── config/ # runtime defaults (model, TopK, min-score, mask)
├── engine/ # recall.Engine + embedder lifecycle wrapper
├── retriever/ # adaptive filter, grounding gate, full-doc enrichment, deep filter
├── prompt/ # query classifier + adaptive system prompt builder (static/dynamic split for caching)
├── llm/ # Anthropic REST/SSE (prompt caching), OpenAI-compat, claude CLI fallback
├── markdown/ # streaming terminal markdown renderer
├── history/ # timestamped Q&A archive
├── picker/ # interactive collection multi-select (charmbracelet/huh)
├── ui/ # logo, colors, source bars (charmbracelet/lipgloss)
└── commands/ # one file per CLI subcommand
Retrieval is powered by recall,
brain's sibling search engine. brain imports it as a Go library
(pkg/recall) — there's no subprocess, no separate binary, no second
install step. recall in turn was inspired by qmd
by Tobi Lütke, which brain used to shell out to in earlier versions.
question ──▶ [--expand] expansion LLM → lex / vec / hyde variants
──▶ recall hybrid search (BM25 + vector + RRF fusion) for the
original query + every lex/vec variant, merged by docid
──▶ [--hyde] embed each hypothetical passage → extra vector probe,
merged into the same pool
──▶ [--rerank] cross-encoder rerank the top-30 (bge-reranker-v2-m3)
→ min-max normalise logits → position-aware blend with RRF rank
(top-3: 75/25, ranks 4-10: 60/40, ranks 11+: 40/60)
──▶ adaptive min-score filter (40% of top score)
──▶ grounding gate (skip LLM if no chunks)
──▶ enrich top results with full documents (recall.Engine.Get)
──▶ [--deep] post-retrieval LLM chunk filter (20 → 8-10)
──▶ classify query → pick mode directive
──▶ build adaptive system prompt (static/dynamic split)
──▶ stream response (prompt-cached on Anthropic)
──▶ verify citations against retrieved set
──▶ print sources + save history with metadata
All bracketed stages are opt-in flags. With none of them set the pipeline is a plain BM25 + vector + RRF hybrid — no subprocess boot, no extra LLM call beyond the final synthesis.
recall is a Go search
engine (BM25 via SQLite FTS5, vector via sqlite-vec, RRF fusion,
cross-encoder reranker, query expansion, HyDE) purpose-built to be
imported, not shelled out to. brain links it directly: one binary
for the retrieval primitives, typed return values, no cross-tool
JSON contract to keep in sync, no separate server to install or keep
running. Local embedding and generation run through recall's
llama.cpp-subprocess backend (auto-downloaded on first use, lives
under ~/.recall/bin/llamacpp/) — no build tags, no CGo on the
inference hot path, fully offline once the model + llama-server
prebuilt are cached locally.
Earlier brain versions (v0.2.x) shelled out to qmd, a Node.js retrieval engine. That migration motivated recall's library-first design — the friction of requiring Node + npm alongside brain, the per-query subprocess overhead, and the JSON-over-stdin contract all went away once retrieval moved in-process.
The official Go SDK is still in beta and its public API shape changes across minor releases. The REST + SSE surface is stable and documented, so we talk to api.anthropic.com directly with net/http. ~200 lines, no dependencies, no version churn. The direct HTTP approach also lets us use content-block arrays with cache_control breakpoints for prompt caching — something the SDK doesn't always expose cleanly.
# Build
go build ./...
# Run the full test suite
go test ./...
# Smoke test the binary
./brain --helpEach package that has non-trivial logic ships with its own _test.go file — see internal/config, internal/retriever, internal/prompt, internal/llm, internal/markdown, and internal/history.
PRs welcome. The code is deliberately straightforward — one file per command, one responsibility per package, no clever abstractions. Please keep it that way.
Before opening a PR:
go build ./...must succeed.go test ./...must pass.- New behavior should come with a test.
brain's retrieval layer is powered by recall, a local-first hybrid search engine. recall's architecture was originally inspired by qmd by Tobi Lütke.
What recall brings to brain:
- BM25 + vector + RRF hybrid fusion — single Go binary, zero runtime dependencies
- Local GGUF embedding via nomic-embed-text-v1.5 (~146 MB, Apache 2.0, 768-dim), served through llama.cpp's
llama-serversubprocess - Cross-encoder reranking via bge-reranker-v2-m3 (continuous 0.0-1.0 scoring through llama-server's
/v1/rerank) blended with RRF rank by position-aware bands - Query expansion via qmd-query-expansion-1.7B (parallel multi-query search: lex + vec variants merged by docid)
- HyDE — hypothetical document embedding: the LLM writes an "ideal answer," recall embeds it as a document and adds that vector as an extra probe
- Smart markdown chunking (900 estimated tokens, break-point scoring) + AST-aware code chunking via tree-sitter for Go, Python, TypeScript, Java, Rust
- Incremental embedding — only modified chunks are re-embedded;
chunks.content_hashgates the re-run - Adaptive min-score floor — 40% of the top result's score as a dynamic threshold, so noisy queries degrade gracefully instead of silently dropping the best available result
The LLM layer is pluggable. brain talks directly to the Anthropic REST API (native streaming + prompt caching — no SDK dependency), to any OpenAI-compatible endpoint (OpenAI proper, Ollama, OpenRouter, LM Studio, LiteLLM, Groq, Together, Fireworks — wired through OPENAI_API_KEY + OPENAI_BASE_URL), and to the Claude Code CLI when neither API key is set (override the binary name with BRAIN_CLAUDE_BIN to point at a fork like opencode). Prompt caching only activates on the Anthropic native path; the other two backends stream responses without caching. See Configuration → Using a different backend for concrete examples.
The terminal UI is built on charmbracelet/huh (pickers), charmbracelet/lipgloss (styling), and chzyer/readline (REPL).
MIT. See LICENSE.








