GitHub - ugurcan-aytar/brain: TUI-first conversational knowledge base with Anthropic prompt caching. Talk to your notes in the terminal, pay for tokens once, cache handles the rest.

Conversational knowledge base over your local notes

brain turns a folder of markdown and text files into a queryable knowledge base. Ask it a question and it retrieves the most relevant chunks from your notes, then streams a grounded answer with citations back to your terminal. When your notes don't cover the question, it tells you — no hallucinated filler. It also runs as a Model Context Protocol server so Claude Desktop and Claude Code can query your notes directly: see MCP server.

It's a TUI-first app, not a thin CLI wrapper around an API call. You get an interactive multi-select collection picker, a readline REPL with tab-completion and unique-prefix slash commands, a streaming markdown renderer that colors headings/code/lists live as tokens arrive, mid-response Ctrl+C cancellation, and model/mode pickers you can invoke mid-session. Built on Cobra, charmbracelet/huh (pickers), charmbracelet/lipgloss (styling), and chzyer/readline (REPL). See Chat mode for the full slash command surface.

Demo

One-shot Q&A — retrieval spinner, streaming markdown answer, cited sources, closing logo:

Interactive REPL — slash commands, /mode switching, grounded answer, clean exit:

Thinking modes — same topic asked four different ways. The structure of the response changes with the mode: recall → direct answer, analysis → findings/connections/gaps, decision → frameworks/recommendation, synthesis → building blocks/action plan.

/challenge — re-score an answer against a different set of sources. brain rebuilds the system prompt around the new chunks and streams a re-grounded response that contrasts the two. Great for stress-testing a conclusion before you commit to it.

TUI pickers — the huh multi-select collection picker that fires on brain chat startup, plus the /model picker for switching Claude models mid-session:

brain search — raw retrieval, no LLM. Lands in a few hundred milliseconds with scored chunks and inline previews. This is what powers the "no context → no LLM → no hallucination" principle:

brain history — every answer is archived on disk with model/collections/elapsed metadata. The default list is scriptable; search and view drill in:

brain history browse — interactive TUI picker: / fuzzy-filters your past questions, f kicks a full-text search across answer bodies, enter opens the viewer, esc pops back to the list, d deletes, c copies the selected answer to your clipboard, x exports it to ~/Downloads/ (override with $BRAIN_EXPORT_DIR), p toggles a ★ pin (also reachable from chat as /pin). Built on bubbles/list + viewport so it pages, scrolls, and stays out of your scrollback:

Core principle

No context → no LLM call → no hallucination.

Every answer brain gives is grounded in chunks retrieved from your own notes. If the retrieval step returns nothing relevant, the LLM call is skipped entirely and you get a clear "nothing found" message instead of a confident-sounding fabrication.

Features

brain ask "<question>" — one-shot Q&A, cited sources, streaming answer
brain chat — interactive multi-turn REPL with slash commands, tab completion, and mid-response cancellation
brain search "<query>" — raw retrieval, no LLM, for verifying your index; -n caps the result count, --expand / --rerank / --rerank-n / --hyde / --explain mirror the ask enhancements so you can tune flags against a concrete output before spending LLM tokens
--effort fast|balanced|thorough (v0.6.0) — opt-in retrieval preset on ask / search / chat. Default: none — brain ask "x" without --effort behaves as v0.5.x did (plain hybrid retrieval, no enhancement). Pass --effort balanced for the recommended retrieval shape (= --expand --rerank); fast is an explicit name for the v0.5.x default (hybrid only); thorough = --expand --rerank --hyde --deep with wider RerankTopN. When a preset is set, explicit flags on the same command win per-field. In chat, /effort fast|balanced|thorough switches preset mid-session (clears prior subprocess toggles)
brain history — every Q&A is saved as a timestamped markdown file with model/collections/elapsed metadata; brain history browse opens an interactive TUI picker with / filter on questions, f for full-text search across answers, enter to view, d to delete, c to copy the answer to clipboard (pbcopy/wl-copy/xclip), x to export to ~/Downloads/, p to pin (★) for later
/challenge — re-score an answer against a different set of sources to check it
Adaptive prompt system — questions are classified into recall, analysis, decision, or synthesis modes, each with a different response structure
Full document enrichment — top results are re-fetched as complete documents so the LLM sees the full source, not just the highest-scoring chunk; long transcripts with detail buried past the intro are handled correctly
brain add --context "description" — tells the search engine what a collection is about, dramatically improving retrieval quality for domain-specific content
--index <name> (global) — keep multiple isolated knowledge bases under one binary (e.g. brain --index work vs brain --index personal). Backed by recall's named-index convention at ~/.recall/indexes/<name>.db
--expand — query expansion via recall's local expansion LLM: generates lex/vec query variants (the retriever runs each and merges) and hypothetical passages that can be combined with --hyde
--hyde — Hypothetical Document Embedding: the LLM produces a short "ideal answer" passage, recall embeds it as if it were a real document, and that vector joins the candidate search as an extra probe
--rerank — cross-encoder rerank the top fused candidates via recall's bundled bge-reranker-v2-m3 (continuous 0.0-1.0 relevance score), then blend with RRF rank using the position-aware 75/25 → 60/40 → 40/60 bands. Window size defaults to 30; tune with --rerank-n
--rerank-n <int> — how many candidates the cross-encoder sees when --rerank is on. Default 30; widen (e.g. 60) when generic queries return too few sources, narrow (e.g. 10) when you want a fast top-of-funnel signal
--first-query-bonus <f> — only consulted with --expand / --hyde. Tunes the first-list weight in recall's RRF cross-variant merge (MergeRankLists). 0 (default) keeps the qmd-style 1.0 (user's literal query counts twice); 0.5 dampens to 1.5×; negative disables. Use for ranking tie-breaks between expansion variants — not for tail-width recovery: an internal A/B (see CHANGELOG v0.4.13) showed the bonus affects same-side ordering but not the final fused score the floor cuts against. Reach for --top-rank-bonus or --floor when the --expand tail trim is biting
--top-rank-bonus <f> — overrides recall's rank-1 FuseRRF bonus (default 0.05). <0 (default sentinel) = use recall's default; 0 disables the bonus entirely; positive values pass through. This is the v0.4.11 floor-inflation counter-knob the A/B identified: --top-rank-bonus 0 shrinks the inflated top score so the adaptive floor cuts less, fully recovering v0.4.10 tail widths on conceptual queries. Equivalent end-state to --floor 0 via a different mechanism (shrink the peak vs. disable the cut). Pair with --floor for two independent dials
--explain — surface a per-chunk score trace (bm25@1 (4.21) vec@3 (0.81) bonus=+0.05 rrf=0.082 hyde@2 rerank=0.83 on the enhanced path, the same shape minus hyde@K / rerank=… on the plain hybrid path) so you can see why a document landed where it did. Works on ask, search, and chat (via /explain toggle)
--deep / /deep — post-retrieval LLM chunk filter (20 → 8-10). Independent of --expand/--rerank/--hyde; sits after retrieval and reduces the working set handed to the answer-generation prompt. Combine freely with the recall enhancements.
Cross-source tension detection — the system prompt forces the model to identify disagreements between sources before synthesizing, pushing beyond shallow summary
Adaptive scoring — instead of a hard relevance cutoff that silently drops chunks, brain uses 40% of the top chunk's score as a dynamic floor; on difficult queries where all scores are low, weak-but-best results survive instead of returning "nothing found"
Citation verification — after every answer, [filename.md] citations are checked against the retrieved sources; fabricated filenames get a ⚠ warning
Prompt caching — on the Anthropic backend, system directives and conversation history are structured for prompt caching, reducing latency and cost on multi-turn chat sessions
brain doctor — checks pipeline health (recall index reachable, embedder loadable, LLM backend configured), prints index sanity (docs / embeddings / chunks-per-doc), surfaces the most recent collection-update timestamp, and reports disk usage for the recall DB / models / llama-server / brain history directories. Actionable fix commands on every failed check
Token telemetry — every LLM call's input / output token counts are surfaced after the answer (one line on brain ask, in the chat footer per turn + cumulative session total in /help) and persisted to history. brain history list shows N→M tok next to each entry. Anthropic SDK + OpenAI-compat backends populate this; Claude CLI fallback reports zero
Collection picker — multi-select UI to scope a question to specific note folders
Model switching — swap between sonnet (default), opus, and haiku mid-session
Ctrl+C everywhere — cancel retrieval or streaming at any time without leaving your terminal in a broken state
Pluggable backend — native Anthropic API, any OpenAI-compatible endpoint (OpenAI, Ollama, OpenRouter, LM Studio, LiteLLM, Groq, Together…), or the local claude CLI as a fallback
brain mcp — Model Context Protocol server mode so Claude Desktop and Claude Code can query your notes directly via tools (brain_ask / brain_search / brain_collections / brain_files / brain_document / brain_document_search / brain_status) and resources (brain://collection/<name>, brain://document/{collection}/{+path}). See MCP server
brain eval — retrieval quality harness: snapshot a query set, diff snapshots over time, catch retrieval regressions before they ship. See Eval harness

Requirements

macOS (arm64) or Linux (amd64). Windows isn't a release target — brain depends on recall's CGo SQLite stack and on llama.cpp's prebuilt llama-server, neither of which we currently build Windows artefacts for.
Linux runtime dep: minimal container bases (ubuntu without build-essential, Alpine, …) need libgomp1 / libgomp installed — recall's llama-server subprocess dlopens OpenMP-linked CPU backend plugins. A normal workstation already has it via gcc/g++.
At least one LLM backend. brain picks the first one it finds, in this order:
1. ANTHROPIC_API_KEY — native Claude API, the fastest and cheapest path (recommended).
2. OPENAI_API_KEY — any OpenAI-compatible endpoint. Works out of the box with OpenAI, and via OPENAI_BASE_URL also with Ollama, OpenRouter, LM Studio, LiteLLM, Groq, Together, Fireworks, etc. See Configuration for examples.
3. The Claude Code CLI on your PATH — useful if you have a Claude subscription but no API key. Override the binary name with BRAIN_CLAUDE_BIN to point at a fork (e.g. opencode).
Go 1.26+ (per go.mod) — only needed if you're building from source.

Install

Homebrew (macOS & Linux)

brew install ugurcan-aytar/brain/brain

No sudo needed — Homebrew manages its own prefix. Works on macOS (arm64) and on Linux (amd64) via Linuxbrew. Every release auto-publishes a formula to the homebrew-brain tap.

One-liner (any POSIX shell)

curl -sSfL https://raw.githubusercontent.com/ugurcan-aytar/brain/main/install.sh | sh

The script downloads the right prebuilt binary for your OS/arch, verifies its SHA-256 against checksums.txt, drops it into /usr/local/bin (or ~/.local/bin as a fallback), and runs brain doctor at the end to confirm your LLM backend is wired up. Retrieval is handled by the embedded recall library — no Node.js, no npm, no second binary to chase. Local embedding and generation use llama.cpp's official llama-server prebuilt as a subprocess (auto-downloaded by recall on first use, lives in ~/.recall/bin/llamacpp/).

Environment overrides: BRAIN_VERSION=v1.2.3 to pin a release, BRAIN_PREFIX=$HOME/.local to change the install prefix.

From source

git clone https://github.com/ugurcan-aytar/brain.git
cd brain
go build -o brain ./cmd/brain
sudo mv brain /usr/local/bin/

With `go install`

go install github.com/ugurcan-aytar/brain/cmd/brain@latest

After any install path, run brain doctor to check that the index is reachable and a Claude (or other) backend is wired up.

Quick start

Kick the tires against the included sample notes:

# 1. Register the example notes folder (auto-runs `brain index` afterward)
brain add ./examples --name examples

# 2. Ask a question
brain ask "What did the team decide about authentication?"

# 3. Or start an interactive conversation
brain chat

Point brain add at your own folder when you're ready — ~/Documents/my-notes, ~/Obsidian/vault, whatever. The examples/ directory ships three small markdown files (meeting notes, a technical doc, a journal entry) chosen to show off retrieval, cross-source connections, and thinking-mode responses on a realistic tiny corpus.

Commands

Query

Command	Description
`brain ask "<question>"`	One-shot Q&A with cited sources
`brain chat`	Interactive multi-turn conversation
`brain search "<query>"`	Raw retrieval results (no LLM)

Global flag (works on every subcommand):

--index <name> — use a named isolated recall index at ~/.recall/indexes/<name>.db instead of the default DB. Lets you keep e.g. brain --index work … separated from brain --index personal … under one binary. Ignored (with a warning) when $RECALL_DB_PATH is set.

Flags on ask:

-c, --collection <name> — scope to one or more collections (skips the picker). Repeatable (-c Lenny -c Reforge) or comma-separated (-c Lenny,Reforge)
-m, --model <model> — sonnet (default), opus, haiku, or a full Anthropic model ID
-M, --mode <mode> — override the auto-detected thinking mode: auto, recall, analysis, decision, synthesis
-n, --top <int> — candidate pool size (default: config TopK)
--expand — query expansion (lex/vec variants + HyDE passages); costs a one-shot expansion-LLM call
--rerank — cross-encoder rerank the top candidates via recall's bge-reranker-v2-m3; continuous 0-1 scores blended with RRF rank
--rerank-n <int> — how many candidates the cross-encoder sees when --rerank is on (default 30)
--floor <pct> — override the adaptive 40%-of-top fusion floor on a per-query basis. 0 shows everything (no cut); 0.25 loosens (more sources); 0.6 tightens. Negative or unset keeps recall's default
--first-query-bonus <f> — first-list weight in recall's --expand / --hyde RRF merge (ranking tie-breaker between expansion variants). 0 (default) = qmd 1.0; 0.5 dampens; negative disables. Not a tail-width knob — see --top-rank-bonus / --floor for that
--top-rank-bonus <f> — overrides recall's rank-1 FuseRRF bonus (default 0.05). <0 = use default; 0 disables → no top-score inflation → adaptive floor cuts less → v0.4.10 tail widths recovered. Pair with --floor for two independent floor-inflation dials
--hyde — Hypothetical Document Embedding: LLM-generated answer passages embedded as extra vector probes
--explain — print a per-chunk score trace (BM25 + vector ranks, RRF score, HyDE rank, reranker score). Renders under each source in the answer block
--deep — post-retrieval LLM chunk filter (20 → 8-10). Independent of the three flags above; combinable.
--copy — after the answer streams, pipe it to the OS clipboard (pbcopy / wl-copy / xclip / xsel auto-detected)
--export — write the answer to ~/Downloads/<auto-name> (override the directory with $BRAIN_EXPORT_DIR). Boolean; no value.
--export-to <PATH> — write the answer to <PATH> exactly. If <PATH> is an existing directory, drops the auto-named file inside; otherwise creates the file (and parent dirs on demand). Wins over --export when both are set. Both forms run regardless of stdout being a TTY, so they work in pipelines

Flags on search: same as ask minus the LLM-specific ones (-m, -M, --deep). --index works here too (it's global).

Flags on chat:

-c, --collection <name> — scope the whole session to one or more collections (skips the startup picker). Repeatable or comma-separated, same as ask
-m, --model <model> — same model aliases as ask; can also be swapped mid-session with /model
--rerank, --expand, --hyde, --explain, --rerank-n <int>, --floor <pct>, --first-query-bonus <f>, --top-rank-bonus <f> — start the session with the matching retrieval enhancement on. Every one of them also has an in-session toggle: /rerank, /expand, /hyde, /explain, /rerank-n <int>, /floor <pct>, /first-query-bonus <f>, /top-rank-bonus <f>

Collections

Command	Description
`brain add <path>`	Register a folder as a collection (runs `brain index` after)
`brain remove <name>`	Remove a collection and clean up its embeddings
`brain collections`	List registered collections
`brain files [-c name]`	List indexed files, optionally filtered by collection

Flags on add:

--name <name> — override the default collection name (folder basename)
--mask <glob> — override the default file glob (**/*.{txt,md})
--context <description> — describe what this collection contains (e.g. "Podcast transcripts about product management"); improves search quality significantly

History

Every answer is archived as a timestamped markdown file (default ~/.brain/history, overridable via BRAIN_HISTORY_DIR). Each file captures the question, answer, sources, and the mode, model, collections, and elapsed time. Browse from the terminal:

Command	Description
`brain history`	List the 10 most recent entries, newest first
`brain history browse`	Interactive TUI picker — `/` to filter by question, `f` to full-text search bodies, `enter` to view, `d` to delete, `c` to copy the answer to clipboard, `x` to export to `~/Downloads/`, `p` to pin (★), `esc` back
`brain history search <query>`	Non-interactive full-text search
`brain history view <id>`	Render an entry (id from the list, 1-based)
`brain history rm <id>`	Delete an entry
`brain history path`	Print the history directory path

Flags on history:

-n, --limit <N> — number of entries to show in the default list (default 10)

Maintenance

Command	Description
`brain index`	Re-scan files and regenerate embeddings
`brain status`	Show index health and brain config
`brain doctor`	Verify recall index + embedder + LLM backend are wired up
`brain cleanup`	Drop orphan chunks + stale embeddings, run SQLite VACUUM. Idempotent — run when `brain_status` reports `embed_coverage > 1.0` (typically a pre-v0.4.0 index leaking vec0 rows on re-index)

Chat mode

brain chat is a full REPL with slash commands, rolling conversation history, and Tab-to-complete.

Slash command	Description
`/help`	Show command list, current model/mode/collections, and the on/off state of every retrieval enhancement
`/mode [name]`	View or change thinking mode (`auto`, `recall`, `analysis`, `decision`, `synthesis`)
`/model [name]`	View or switch Claude model; bare `/model` opens a picker
`/rerank`	Toggle cross-encoder reranking (start state set by `--rerank` startup flag)
`/rerank-n [N]`	View or set the reranker window size. `0` resets to default (30)
`/floor [pct]`	View or set the fusion floor. `0` shows every candidate; `0.25` loosens; `0.6` tightens. `/floor reset` drops the override
`/first-query-bonus [f]`	View or set the first-list weight on `--expand`/`--hyde` RRF merge (ranking tie-breaker). `0` (default) = qmd 1.0; `0.5` dampens; negative disables. `/first-query-bonus reset` drops the override
`/top-rank-bonus [f]`	View or set the rank-1 FuseRRF bonus (default 0.05). `0` disables; positive value passes through. `/top-rank-bonus reset` drops the override. Real counter-knob to v0.4.11 floor inflation — pairs with `/floor`
`/copy`	Copy the last answer to the clipboard (pbcopy/wl-copy/xclip/xsel auto-detected)
`/export`	Write the last answer to `~/Downloads/<filename>` (override with `$BRAIN_EXPORT_DIR`). Filename mirrors the saved-history entry so `brain history browse` and the export agree on names
`/pin`	Pin / unpin the last answer. Pinned entries get a ★ in `brain history browse`; press `p` there to toggle from the list, too
`/diff`	Compare the last two answers in the current session: side-by-side metadata table, source set diff (`−` prev-only / `+` curr-only / shared count), and both bodies. Useful after a `/challenge` or after toggling `--rerank` / `--floor` between turns
`/expand`	Toggle query expansion (lex/vec variants)
`/hyde`	Toggle HyDE — hypothetical-doc embedding as extra vector probes
`/explain`	Toggle per-chunk score traces under the source block
`/deep`	Toggle two-pass deep retrieval (LLM filters chunks 20 → 8-10)
`/collections`	Re-run the collection picker
`/sources`	Show the sources from the last answer
`/challenge`	Re-score the last Q&A against a different set of collections
`/clear`	Reset conversation history
`/quit`	Exit chat (also: `Ctrl+C` twice)

Slash commands support unique-prefix matching: typing /col resolves to /collections. Disambiguation note: /rerank vs /rerank-n, /expand vs /explain, and /floor vs /first-query-bonus all need their full word (or /expa, /expl, /rerank-, /fl, /fi) — /r, /exp, and /f are ambiguous and the resolver prints the candidates. /t resolves uniquely to /top-rank-bonus. Tab auto-completes partial commands.

Press Ctrl+C once during streaming to cancel the in-flight request. Press it twice within two seconds on an empty prompt to exit.

Multi-line input

The chat input is a multi-line textarea that auto-grows as content wraps. To insert a newline without submitting:

Shift + Enter — works on iTerm2, alacritty, kitty, foot, wezterm, Ghostty, and any other terminal that supports xterm's modifyOtherKeys mode 2 or the Kitty progressive keyboard protocol. brain enables both on chat startup (and restores defaults on exit). The modes also re-encode Ctrl+C, Ctrl+D, Alt+letter and friends; brain ships its own translator so every standard shortcut keeps working.
\ + Enter — cross-terminal fallback, works everywhere (the trailing backslash is consumed).
Ctrl + J — universal alternative. Ctrl+J sends LF (\n), distinct from Enter's CR (\r) on every terminal.
Alt + Enter — works when your terminal sends Option as Esc+ (iTerm2 default; for macOS Terminal.app, enable Preferences → Profiles → Keyboard → "Use Option as Meta key").

macOS Terminal.app doesn't implement modifyOtherKeys, so Shift+Enter there falls back to plain Enter. Use \+Enter or Ctrl+J on that one terminal, or add a per-profile keybinding (Preferences → Profiles → Keyboard → + → Key Shift-Return, Action Send Text, Value \033\r).

Pasting multi-line content

When you paste 5+ lines or 280+ characters, the input collapses the blob into a [Pasted text #N +M lines] placeholder. The placeholder is what shows in the box, scrollback, and /sources history; the original text is preserved and handed to the LLM unchanged. This keeps the terminal clean without hiding context from the model.

Thinking modes

Every question gets a system prompt with one of four response structures. Auto-classification picks one based on regex heuristics (English + Turkish), or you can force one with -M / /mode.

Mode	Trigger examples	Response structure
recall	"what did my notes say about…", "list…", "what is…"	Direct answer → Related context
analysis	"why…", "compare…", "how does X relate to Y", "explain…"	Key findings → Connections → Gaps → Synthesis
decision	"should I…", "pros and cons", "recommend…", "worth it"	Relevant frameworks → Arguments → Blind spots → Recommendation
synthesis	"plan…", "how can I build/scale/launch…", "roadmap"	Building blocks → Integration → Action plan → Assumptions & gaps

Analysis is the default when nothing matches — it's the most generally useful.

MCP server

brain mcp runs brain as a Model Context Protocol server on stdio so Claude Desktop and Claude Code can hit your notes the same way they hit any other MCP source — no separate CLI invocation, no copy-paste. The MCP server is a thin retrieval surface: it never calls brain's own answer LLM. The calling model (Claude) reads the retrieved chunks and writes the answer itself, which means no API key has to live inside the MCP server config.

Claude Desktop / Claude Code config

Add an mcpServers entry to ~/Library/Application Support/Claude/claude_desktop_config.json (Desktop, macOS) or your project's .mcp.json (Claude Code):

{
  "mcpServers": {
    "brain": {
      "command": "brain",
      "args": ["mcp"]
    }
  }
}

If you keep multiple isolated indexes, point at one with --index:

{
  "mcpServers": {
    "brain-work": { "command": "brain", "args": ["--index", "work", "mcp"] }
  }
}

Restart the host app — Claude will list brain_* tools and the brain://collection/<name> resources next time you open a thread.

Claude Code CLI can also wire it up without editing JSON:

claude mcp add brain brain mcp
claude mcp list         # confirm brain is registered
claude                  # start a session, ask "what brain tools are available?"
claude mcp remove brain # uninstall

claude mcp add accepts the same flag forms, e.g. claude mcp add brain-work brain -- --index work mcp for a named index.

Debugging with MCP Inspector — official tool that surfaces every tool / resource / schema in a browser UI, so you can exercise brain interactively without a host LLM in the loop:

npx @modelcontextprotocol/inspector brain mcp

Exposed tools

Tool	Args	Returns
`brain_ask`	`question`, `collections?`, `top?`, `rerank?`, `expand?`, `hyde?`, `floor_pct?`	ranked source chunks (title, display_path, snippet, citation) — enriched with full document text on the top 3 hits
`brain_search`	`query`, `collections?`, `top?`, `expand?`, `rerank?`, `floor_pct?`	raw ranked sources from BM25 + vector + RRF (no LLM, no enrichment)
`brain_collections`	—	every registered collection with name, path, doc count, and the human-written context blurb
`brain_files`	`collection`, `glob?`, `top?` (v0.5.10), `offset?` (v0.5.10)	relative paths of every indexed file in the collection (doublestar globs supported). `top` defaults to 50; pair with `offset` for paging. Response carries `total` / `has_more` so the LLM knows whether to ask for more
`brain_document` (v0.5.10+)	`collection`, `path`	full text of a single indexed document — the tool-form equivalent of the `brain://document/{collection}/{+path}` resource template, for MCP hosts that don't surface resource reads to the LLM
`brain_document_search` (v0.5.4+)	`collection`, `path`, `query`, `top?`, `rerank?` (v0.5.10)	ranked chunks inside one document, scored by vector similarity to the query. Surgical retrieval — use it once you know which document to look in; bypasses corpus-level disambiguation tradeoffs. Pass `rerank: true` to run the cross-encoder over the returned chunks for lexical-match precision
`brain_status` (v0.5.8+)	—	index-health snapshot: schema version, recall version, collection / document / chunk / embedding counts, embedding-coverage ratio, latest indexing activity, and a per-collection breakdown with `last_modified` timestamps. `embed_coverage > 1.0` is the signal that `brain cleanup` needs to run

Exposed resources

Each registered collection becomes a resource at brain://collection/<name>. Reading the resource returns the concatenated full text of every document in that collection, with a # <collection>/<path> header before each document so the calling model can cite precisely. Heavy on large collections — use brain_search for targeted queries when the full dump isn't needed.

Exposed resource templates

Concrete per-doc reads via the URI template brain://document/{collection}/{+path} (v0.5.2+). The MCP client lists candidate paths with brain_files (or pulls them from brain_search output) and reads any one of them with a single resources/read call — far lighter than dumping the whole collection. Nested paths like Reforge/Product/Strategy/file.txt travel unencoded thanks to RFC 6570 {+path} expansion; percent-encoded characters (commas, parentheses, non-ASCII) are decoded on the server before lookup. Response shape matches the bulk resource (# <collection>/<path> header + body) so client rendering is uniform.

Autocomplete (v0.5.3+): the server implements MCP's completion/complete extension so the template's collection and path variables get dropdown suggestions in MCP Inspector and any other host that surfaces template-driven autocomplete. Type a partial value (case-insensitive substring) to narrow; the path dropdown is scoped to whichever collection you already picked. No context collection → empty path list, so you fill in the collection first.

Example flow

User    > what did my notes say about activation energy?
Claude  → brain_ask(question="activation energy", collections=["chem", "physics"])
        ← 3 sources: chem/kinetics.md (0.81), physics/thermo.md (0.74), chem/notes.md (0.62)
Claude  > Activation energy is the minimum energy reactants need to start a reaction.
          Your notes [chem/kinetics.md] frame it as the gap between the resting state
          and the transition state on the reaction-coordinate diagram. The thermo notes
          [physics/thermo.md] add the Arrhenius form k = A·exp(−E_a / RT)…

stdout is reserved for the MCP protocol stream; structured logs (JSON-lines) go to stderr.

Eval harness

brain eval turns retrieval quality from a vibe into a number. Define a set of queries, snapshot the ranked results, diff snapshots over time. The motivating use case was the v0.4.11 floor-inflation A/B — that whole workflow now lives inside the binary.

Quickstart

brain eval run --name baseline-v0.4.13     # snapshot today's behavior
brain eval baseline baseline-v0.4.13       # pin it for auto-diff
brain eval run --name with-rerank          # snapshot the change
                                           # → auto-diffs against the baseline
brain eval diff baseline-v0.4.13 with-rerank --json | jq '.regression_count'

# v0.5.15+: also exercise citation coverage by running the LLM per
# query. Opt-in, never on by default — needs an LLM backend and is
# slower / less reproducible than retrieval-only. CI + scheduled runs
# stay retrieval-only.
brain eval run --with-answers --name baseline-v0.5.15-with-answers

First run with no query file at ~/.brain/eval/queries.json creates a sample for you to edit (three queries: one literal, one conceptual, one vague) and exits with a hint.

Metrics

Every snapshot records two retrieval-time metrics per query (since v0.5.15):

Metric	Computed from	Always populated?
`source_diversity`	distinct collections in top-K / `min(len(chunks), total_collections)`	yes, whenever the query returned chunks
`citation_coverage`	distinct `[filename.ext]` citations in the answer that match a retrieved chunk / `len(chunks)`	only under `--with-answers`

Both surface in brain eval diff (mean + delta lines) and brain telemetry eval (per-run div + cov columns). Pointer-typed in the on-disk JSON so an absent metric reads as missing, not as 0.000 — important because the CI gate would otherwise mis-fire on every retrieval-only snapshot.

Snapshots written by brain ≤ v0.5.14 (schema_version: 1) load unchanged in v0.5.15+; the new metric fields just stay empty. Snapshots from a newer brain are refused with a clear "upgrade" message.

Query file format

{
  "schema_version": 1,
  "queries": [
    {
      "id": "q1",
      "tag": "literal",
      "query": "product market fit signals",
      "collections": ["YCombinator", "PaulGraham"],
      "flags": {"top": 10, "expand": true, "rerank": true, "floor": -1}
    }
  ]
}

flags.floor mirrors the --floor sentinel: <0 means "use recall's default 0.4", 0 disables the adaptive floor, positive values pass through.

Diff output

For each query the diff surfaces:

counts: how many sources baseline / current returned
shared / added / dropped DocID sets
rank changes sorted by absolute delta — biggest movers first
regression candidates — DocIDs that sat in baseline top-3 but vanished from the current ranking (file these as bugs)
mean Jaccard across the full query set

--json emits the full EvalDiffSummary payload for CI / scripted comparisons.

CI integration

brain ships a ready-to-use workflow at .github/workflows/eval.yml (v0.5.11+) that exercises retrieval quality against the fixture corpus in testdata/eval-fixture/ on every pull request. Two builds, two snapshots, one diff:

builds brain at the PR base SHA and the PR HEAD
bootstraps a temp index against the fixture corpus from each binary
snapshots both with the same queries.json
diffs them with brain eval diff pr-base pr-head --json
fails the job when mean_jaccard < 0.70 OR regression_count > 0

Mock embedder drives the vector half so no llama-server subprocess or API key is needed on the CI runner. The fixture lives in-repo so anyone forking can audit / extend the queries by editing testdata/eval-fixture/queries.json.

For a hand-rolled equivalent (or a different repo that wants this pattern):

brain eval run --name "ci-$(git rev-parse --short HEAD)"
brain eval diff prev-baseline "ci-$(git rev-parse --short HEAD)" --json \
  | jq -e '.regression_count == 0'

Fail the build when a known-good document falls out of the top-3.

Real-world example

Snapshot from 2026-05-16 (v0.4.13 release) → snapshot 12h later (same brain, same recall, same queries, same collections): mean Jaccard 0.86, zero regression candidates, but five of six conceptual queries that previously hit the v0.4.11 floor inflation now return full width on default settings. Same code, different corpus shape (Founders grew to 1094 docs) — a wider tail dilutes the top RRF score's relative inflation. This is exactly what brain eval exists to surface: behavior shifts driven by data, not code.

Scheduling + drift alerts (v0.5.6+, Linux added v0.5.13)

brain eval schedule install [--cadence daily|weekly|monthly] [--hour <0-23>] installs a native recurring trigger that runs brain eval run automatically. The platform dispatch picks the right scheduler at compile time:

macOS → launchd LaunchAgent at ~/Library/LaunchAgents/com.ugurcan-aytar.brain.eval.plist, registered via launchctl bootstrap gui/<uid>.
Linux → systemd user unit pair at ~/.config/systemd/user/com.ugurcan-aytar.brain.eval.{service,timer}, armed via systemctl --user enable --now …timer. No sudo: the user manager runs everything under your UID. The timer carries Persistent=true so a missed firing fires on next boot.

brain eval schedule status shows whether the trigger is loaded (and on Linux, when it next fires via systemctl --user list-timers); brain eval schedule uninstall tears it down. Other operating systems return an explicit not-supported error rather than writing a half-broken artefact.

Each scheduled run records to ~/.brain/telemetry/eval-history.jsonl. If the auto-diff against the pinned baseline shows mean Jaccard < 0.70 OR any regression candidate, the run flips an alert flag in the telemetry record and emits an ALERT: line to stderr — visible in the scheduler log at ~/.brain/eval/launch.log. brain telemetry eval renders the most recent N runs with their drift signals.

Telemetry

brain records three local-only JSONL streams under ~/.brain/telemetry/ so you can chart drift over time, audit how MCP tools are being used, and see per-backend latency / token usage for your own chat turns. No network, no third-party SaaS — everything lives on disk in append-only JSON lines you can grep / jq directly. Kill switch: BRAIN_TELEMETRY=off to disable writes (read-side queries still work on existing records).

File	Source	One record per
`eval-history.jsonl`	`brain eval run`	each invocation (label, mean Jaccard vs baseline, regression count, alert flag, plus mean source diversity + optional mean citation coverage from v0.5.15)
`mcp-events-YYYY-MM-DD.jsonl`	`brain mcp` tool calls	each tool call (tool name, args summary, query body, latency ms, error, result size). Rotates daily
`chat-turns-YYYY-MM-DD.jsonl`	`brain ask` + `brain chat`	each answer turn (mode, question, backend, model, collections, source count, latency ms, input / output tokens, error). Rotates daily. Added v0.5.14

Queries:

brain telemetry mcp                   # last 7d histogram, all tools
brain telemetry mcp --tool brain_ask  # one-tool latency breakdown
brain telemetry mcp --last 30 --json  # 30d window, machine output

brain telemetry eval                  # last 20 eval runs + drift signals
brain telemetry eval --last 5 --json  # newest 5 as JSON

brain telemetry chat                          # last 7d per-backend latency + tokens
brain telemetry chat --backend anthropic_api  # one backend (anthropic_api | openai | claude_cli)
brain telemetry chat --mode chat --last 30    # 30d, chat-only (ask filtered out)
brain telemetry chat --json                   # machine-readable

Configuration

Defaults live in internal/config/config.go. The interesting knobs:

Setting	Default	Purpose
`Model`	`claude-sonnet-4-6`	Default Claude model
`MaxTokens`	`16384`	Response length cap
`TopK`	`20`	Chunks to retrieve per question
`MinScore`	`0.05`	Minimum relevance floor (adaptive filter uses 40% of top score)
`MinChunksToCallLLM`	`1`	Grounding gate threshold
`MaxConversationTurns`	`10`	Chat history cap (user + assistant per turn)
`DefaultMask`	`*/.{txt,md}`	Files to index when adding a collection

Environment variables:

Variable	Purpose
`ANTHROPIC_API_KEY`	Use the native Anthropic API (highest priority backend)
`OPENAI_API_KEY`	Use any OpenAI-compatible `/v1/chat/completions` endpoint
`OPENAI_BASE_URL`	Override the OpenAI endpoint — point this at Ollama, OpenRouter, LM Studio, LiteLLM, etc. Defaults to `https://api.openai.com/v1`
`OPENAI_MODEL`	Model name to send to the OpenAI-compatible endpoint. Defaults to `gpt-4o`. Also honors `-m` / `/model` when the value doesn't look like a Claude alias, so `brain ask -m llama3.1 "…"` works on Ollama
`BRAIN_CLAUDE_BIN`	Name of the Claude CLI binary brain shells out to. Defaults to `claude`. Set to `opencode` (or another fork that speaks the same `stream-json` protocol) to reuse the CLI fallback without rebuilding
`BRAIN_HISTORY_DIR`	Override where Q&A history is written (defaults to `~/.brain/history`)
`BRAIN_PINNED_PATH`	Override the pinned-entry sidecar file (defaults to `$BRAIN_HISTORY_DIR/.pinned`, then `~/.brain/pinned.txt`)
`BRAIN_EXPORT_DIR`	Override where `brain ask --export` and chat's `/export` drop files (defaults to `~/Downloads/`)
`BRAIN_CLASSIFY_LLM`	Set to `0` / `false` / `off` to skip the LLM fallback when the regex classifier doesn't match a query. Default behaviour: one short LLM call on the miss path, never on the hit path
`RECALL_RERANK_PARALLEL`	Override how many slots the reranker subprocess exposes (default 4; clamped to `[1, 16]`). Larger values overlap more cross-encoder forward passes at the cost of ~150–250 MB of KV cache per slot. Set to `1` on small machines

Using a different backend

Ollama (local, free, offline-capable):

export OPENAI_API_KEY=ollama          # any non-empty string works
export OPENAI_BASE_URL=http://localhost:11434/v1
export OPENAI_MODEL=llama3.1
brain ask "what did I write about activation energy?"

OpenRouter (one key, every model):

export OPENAI_API_KEY=sk-or-…
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
brain ask -m meta-llama/llama-3.1-70b-instruct "…"

OpenAI proper:

export OPENAI_API_KEY=sk-…
brain ask "…"                         # uses gpt-4o by default

opencode instead of claude:

export BRAIN_CLAUDE_BIN=opencode
brain ask "…"

Note: brain's adaptive prompts and thinking-mode directives are tuned for Claude. Non-Claude models will work — the retrieval gate ("no chunks → no LLM call") is model-agnostic — but response quality, especially for synthesis and decision modes, varies. Multi-query expansion, citation verification, adaptive scoring, and TopK 20 work on all backends. Prompt caching is Anthropic-only (OpenAI auto-caches shared prefixes without explicit markup). Run brain doctor to see which backend is active.

Architecture

cmd/brain/            # Cobra entry point + subcommand wiring
internal/
├── config/           # runtime defaults (model, TopK, min-score, mask)
├── engine/           # recall.Engine + embedder lifecycle wrapper
├── retriever/        # adaptive filter, grounding gate, full-doc enrichment, deep filter
├── prompt/           # query classifier + adaptive system prompt builder (static/dynamic split for caching)
├── llm/              # Anthropic REST/SSE (prompt caching), OpenAI-compat, claude CLI fallback
├── markdown/         # streaming terminal markdown renderer
├── history/          # timestamped Q&A archive
├── picker/           # interactive collection multi-select (charmbracelet/huh)
├── ui/               # logo, colors, source bars (charmbracelet/lipgloss)
└── commands/         # one file per CLI subcommand

Retrieval is powered by recall, brain's sibling search engine. brain imports it as a Go library (pkg/recall) — there's no subprocess, no separate binary, no second install step. recall in turn was inspired by qmd by Tobi Lütke, which brain used to shell out to in earlier versions.

Retrieval → grounding → synthesis

question ──▶ [--expand] expansion LLM → lex / vec / hyde variants
         ──▶ recall hybrid search (BM25 + vector + RRF fusion) for the
             original query + every lex/vec variant, merged by docid
         ──▶ [--hyde] embed each hypothetical passage → extra vector probe,
             merged into the same pool
         ──▶ [--rerank] cross-encoder rerank the top-30 (bge-reranker-v2-m3)
             → min-max normalise logits → position-aware blend with RRF rank
             (top-3: 75/25, ranks 4-10: 60/40, ranks 11+: 40/60)
         ──▶ adaptive min-score filter (40% of top score)
         ──▶ grounding gate (skip LLM if no chunks)
         ──▶ enrich top results with full documents (recall.Engine.Get)
         ──▶ [--deep] post-retrieval LLM chunk filter (20 → 8-10)
         ──▶ classify query → pick mode directive
         ──▶ build adaptive system prompt (static/dynamic split)
         ──▶ stream response (prompt-cached on Anthropic)
         ──▶ verify citations against retrieved set
         ──▶ print sources + save history with metadata

All bracketed stages are opt-in flags. With none of them set the pipeline is a plain BM25 + vector + RRF hybrid — no subprocess boot, no extra LLM call beyond the final synthesis.

Why recall as a library?

recall is a Go search engine (BM25 via SQLite FTS5, vector via sqlite-vec, RRF fusion, cross-encoder reranker, query expansion, HyDE) purpose-built to be imported, not shelled out to. brain links it directly: one binary for the retrieval primitives, typed return values, no cross-tool JSON contract to keep in sync, no separate server to install or keep running. Local embedding and generation run through recall's llama.cpp-subprocess backend (auto-downloaded on first use, lives under ~/.recall/bin/llamacpp/) — no build tags, no CGo on the inference hot path, fully offline once the model + llama-server prebuilt are cached locally.

Earlier brain versions (v0.2.x) shelled out to qmd, a Node.js retrieval engine. That migration motivated recall's library-first design — the friction of requiring Node + npm alongside brain, the per-query subprocess overhead, and the JSON-over-stdin contract all went away once retrieval moved in-process.

Why direct HTTP instead of the Anthropic SDK?

The official Go SDK is still in beta and its public API shape changes across minor releases. The REST + SSE surface is stable and documented, so we talk to api.anthropic.com directly with net/http. ~200 lines, no dependencies, no version churn. The direct HTTP approach also lets us use content-block arrays with cache_control breakpoints for prompt caching — something the SDK doesn't always expose cleanly.

Development

# Build
go build ./...

# Run the full test suite
go test ./...

# Smoke test the binary
./brain --help

Each package that has non-trivial logic ships with its own _test.go file — see internal/config, internal/retriever, internal/prompt, internal/llm, internal/markdown, and internal/history.

Contributing

PRs welcome. The code is deliberately straightforward — one file per command, one responsibility per package, no clever abstractions. Please keep it that way.

Before opening a PR:

go build ./... must succeed.
go test ./... must pass.
New behavior should come with a test.

Credits

brain's retrieval layer is powered by recall, a local-first hybrid search engine. recall's architecture was originally inspired by qmd by Tobi Lütke.

What recall brings to brain:

BM25 + vector + RRF hybrid fusion — single Go binary, zero runtime dependencies
Local GGUF embedding via nomic-embed-text-v1.5 (~146 MB, Apache 2.0, 768-dim), served through llama.cpp's llama-server subprocess
Cross-encoder reranking via bge-reranker-v2-m3 (continuous 0.0-1.0 scoring through llama-server's /v1/rerank) blended with RRF rank by position-aware bands
Query expansion via qmd-query-expansion-1.7B (parallel multi-query search: lex + vec variants merged by docid)
HyDE — hypothetical document embedding: the LLM writes an "ideal answer," recall embeds it as a document and adds that vector as an extra probe
Smart markdown chunking (900 estimated tokens, break-point scoring) + AST-aware code chunking via tree-sitter for Go, Python, TypeScript, Java, Rust
Incremental embedding — only modified chunks are re-embedded; chunks.content_hash gates the re-run
Adaptive min-score floor — 40% of the top result's score as a dynamic threshold, so noisy queries degrade gracefully instead of silently dropping the best available result

The LLM layer is pluggable. brain talks directly to the Anthropic REST API (native streaming + prompt caching — no SDK dependency), to any OpenAI-compatible endpoint (OpenAI proper, Ollama, OpenRouter, LM Studio, LiteLLM, Groq, Together, Fireworks — wired through OPENAI_API_KEY + OPENAI_BASE_URL), and to the Claude Code CLI when neither API key is set (override the binary name with BRAIN_CLAUDE_BIN to point at a fork like opencode). Prompt caching only activates on the Anthropic native path; the other two backends stream responses without caching. See Configuration → Using a different backend for concrete examples.

The terminal UI is built on charmbracelet/huh (pickers), charmbracelet/lipgloss (styling), and chzyer/readline (REPL).

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
.github		.github
assets		assets
cmd/brain		cmd/brain
examples		examples
internal		internal
testdata/eval-fixture		testdata/eval-fixture
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh

Folders and files

Latest commit

History

Repository files navigation

Conversational knowledge base over your local notes

Demo

Table of contents

Core principle

Features

Requirements

Install

Homebrew (macOS & Linux)

One-liner (any POSIX shell)

From source

With go install

Quick start

Commands

Query

Collections

History

Maintenance

Chat mode

Multi-line input

Pasting multi-line content

Thinking modes

MCP server

Claude Desktop / Claude Code config

Exposed tools

Exposed resources

Exposed resource templates

Example flow

Eval harness

Quickstart

Metrics

Query file format

Diff output

CI integration

Real-world example

Scheduling + drift alerts (v0.5.6+, Linux added v0.5.13)

Telemetry

Configuration

Using a different backend

Architecture

Retrieval → grounding → synthesis

Why recall as a library?

Why direct HTTP instead of the Anthropic SDK?

Development

Contributing

Credits

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 67

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

With `go install`

Packages