::: ::: :::::::: ::::::::: ::: :+: :+: :+: :+: :+: :+: :+: :+: +:+ +:+ +:+ +:+ +:+ +:+ +:+ +:+ +#++:++ +#+ +:+ +#+ +:+ +#++:++#++: +#+ +#+ +#+ +#+ +#+ +#+ +#+ +#+ #+# #+# #+# #+# #+# #+# #+# #+# ### ### ######## ######### ### ###
Koda is your ideal development partner.
Koda is a fully autonomous software engineering agent that runs as a native desktop application. No IDE extensions, no cloud servers, no clipboard gymnastics — it reads your codebase, edits files, runs commands, and ships code directly in your local environment.
Features • Modes • Tools • Providers • Installation • Build • Architecture
- Autonomous pair-programming — Koda reads your project structure, understands the architecture, and edits files directly. No copy-paste required.
- Snapshot & rollback — before every message, Koda captures a full in-memory snapshot of all workspace files. Hover any user message and click
↺to restore both files and agent memory to that exact point. - Persistent project sessions — conversation history and pinned files are saved to disk per working directory (MD5-hashed path in
userData/sessions/) and restored automatically when you switch back to a project. - Task queue — send the next task while the agent is still working. Koda queues it and fires it automatically when the current task finishes.
- Real PTY terminal — native shell integration via
node-pty. Koda spawns background processes, waits for output patterns, sends stdin (passwords,y/nprompts), and kills processes by PID — all autonomously. - Interactive terminal panel — a full
xterm.jsterminal for you to use directly, independent of the agent, with resize support and ANSI rendering. - Split-view panels — browser preview and terminal panel share the left side with a draggable vertical divider. The main horizontal divider is also resizable.
- Built-in browser preview — a
<webview>-based browser panel with navigation controls, defaulting tolocalhost:5173. Useful for inspecting running apps without leaving Koda. - Web navigation agent — via
operantid.js, Koda can spawn a sub-agent that controls a real browser to navigate, interact with UI elements, and extract data from websites. - Shell approval system — non-read-only commands pause for your approval inline, with three levels: once, base command (session), or full string (session). Allowlists persist in
localStorage. - 4 operation modes — Fast, Planner, Colab, and Teach & Code, selectable from a dropdown in the TitleBar.
- Planner mode — for complex tasks, Koda enters a read-only exploration cycle, writes a detailed Markdown plan, and waits for your explicit approval before touching any file.
- Collaborative mode — Koda can open a multi-turn session with a second LLM (the "advisor") for architectural brainstorming, then proceed with the implementation.
- Teach & Code mode — Koda acts as a technical mentor, explaining every non-obvious decision with trade-offs and alternatives as it codes.
- Skills system — Markdown-based skill files inject specialized instructions into the agent's context on demand. Invoke with
/skill-name [message]or let the agent load them autonomously viaload_skill. Global skills live in~/.koda/skills/, project-local in.koda/skills/. - Dynamic slash menu — typing
/in the chat input opens a live-filtered dropdown listing all native commands and available skills, with keyboard navigation identical to@file mentions. - Inline diff viewer —
file_editoutputs render as a side-by-side visual diff with line numbers, additions in cyan and deletions in rose, grouped by hunk. - System notifications — native OS notification fires when a long task (>3s) completes and the window is not in focus.
- Remote Control API — built-in HTTP server (default port
3141) exposesPOST /task,GET /status,POST /reset, andGET /messages. Pair with Tailscale for secure remote access from any device on your network. Tasks appear in chat with a🌐 Remotebadge. - MCP support — connect any Model Context Protocol server (local process or external SSE endpoint). Tools are discovered at runtime via JSON-RPC handshake and injected into the agent's arsenal dynamically.
- LSP integration — semantic queries via
typescript-language-server: hover types, go-to-definition, and symbol resolution without reading entire files. - 13 LLM providers — dynamic model listing via API. Switch providers and models from the UI without restarting.
- File tracker — every file the agent reads or modifies is tracked in-session and surfaced in the context panel.
- At-mentions (
@) — type@in the chat input to open a file selector and inject file context directly into your message. - Drag & drop — drop image files to attach them to the next message; drop code files to inject an
@[path]mention automatically. - Configurable verbosity — toggle output visibility per tool type (shell, file_read, file_edit, search, LSP, browser, etc.) without affecting agent context.
- 4 built-in themes — Tokyo Night, GitHub Dark, Cyberpunk Neon, Monokai. Live preview, JSON-based, fully customizable.
- Context-aware system prompt — the system prompt is rebuilt dynamically on every session, injecting the current working directory, OS, shell, project name, framework, and available tools.
Switchable from the TitleBar via a dropdown selector. The active mode is enforced at the API level — tools that don't belong to the current mode are completely hidden from the LLM.
Immediate autonomous execution. The agent acts on your request without any planning step. enter_plan_mode and exit_plan_mode are removed from the tool list entirely — the LLM cannot see or invoke them.
Before writing any code, Koda enters a read-only exploration cycle using only file_read, search, list_dir, file_find, and lsp_query. It then calls exit_plan_mode with a complete Markdown plan. A modal appears in the UI for you to Approve or Reject. Destructive tools (file_write, file_edit, shell) are blocked at the registry level until approval is granted.
Activates three additional tools: start_collaboration, send_to_advisor, and end_collaboration. Koda can open a multi-turn conversation with a second model instance (configured as advisorModel in settings) to brainstorm architecture before implementing.
Koda acts as a technical mentor. For every non-obvious change, it explains why that approach was chosen over common alternatives, using code comparisons when helpful. Ideal for learning a codebase or understanding architectural decisions as they happen.
All tools extend BaseTool and are registered in ToolRegistry. The registry enforces mode restrictions and plan-mode write locks before every execution.
| Tool | Description |
|---|---|
shell |
Spawns a PTY process via node-pty. Always runs in the background. Returns PID immediately. Requires user approval for non-read-only commands (configurable allowlist). |
shell_wait |
Polls a background PTY's output buffer for a regex/string pattern, or waits for process exit. Configurable timeout (default 30s). |
shell_input |
Writes raw stdin to a running PTY process. Used for interactive prompts, REPL inputs, and password fields. |
kill_pty |
Sends SIGINT (Ctrl+C) or SIGKILL to a background PTY by PID. |
list_pty |
Lists all active background PTY PIDs. |
file_read |
Reads file content with optional start_line/end_line range. Returns numbered lines and language detection. |
file_write |
Creates or overwrites a file. Auto-creates parent directories. |
file_edit |
Replaces an exact string match within a file. Returns a colored unified diff. Safe: only replaces the first occurrence if multiple matches exist. |
file_find |
Glob-pattern file search via globby. Respects .gitignore. |
list_dir |
Lists directory contents with file sizes. Supports recursive mode (max depth 3) and hidden file toggle. |
search |
Regex search across files. Uses ripgrep when available, falls back to a manual Node.js walker. Supports glob include filters and case-insensitive mode. |
lsp_query |
Semantic queries via typescript-language-server: hover (types/JSDoc) and goToDefinition. Lazy-initializes a singleton LSP client on first use. |
browser_agent |
Spawns operant-runner.js as a child process. Passes task and URL via stdin as JSON. The sub-agent controls a real browser and returns a report. |
enter_plan_mode |
Transitions the agent to read-only plan mode. (Planner mode only) |
exit_plan_mode |
Presents the Markdown plan to the user and awaits approval via a Promise that resolves through IPC. (Planner mode only) |
start_collaboration |
Initializes an advisor LLM session with a dedicated system prompt. (Colab mode only) |
send_to_advisor |
Sends a message to the advisor and streams back the response. (Colab mode only) |
end_collaboration |
Terminates the advisor session and clears its conversation state. (Colab mode only) |
load_skill |
Loads a skill by name from ~/.koda/skills/ or .koda/skills/ and injects its instructions into context. The agent calls this autonomously when it detects a matching task domain. |
When the agent calls shell, Koda checks the command against:
- A read-only safelist (e.g.
ls,cat,git status) — auto-approved. - A base command allowlist (e.g.
npm) — persisted inlocalStorageand synced to the main process. - A full command allowlist (e.g.
npm install) — same persistence.
If none match, the UI shows an approval prompt with three options: Accept Once, Accept Base Command (session), or Accept Full Command (session). The agent's PTY execution is suspended via a Promise until the user responds.
Models are fetched dynamically via each provider's API. Just enter your key and select from the dropdown — no hardcoded model lists.
| Provider | Notes |
|---|---|
| OpenRouter | Hundreds of models via a single key |
| OpenAI | GPT-4o, o1, o3 families; filtered from /v1/models |
| Anthropic | All Claude models; falls back to a curated list if the models API is unavailable |
| Google Gemini | All Gemini models from /v1beta/models |
| Groq | All models from the Groq platform |
| DeepSeek | All DeepSeek models |
| Mistral AI | All Mistral models including Codestral |
| Together AI | All models from the Together platform |
| xAI | Grok family |
| Zhipu AI | GLM family; falls back to a curated list |
| Maritaca AI | Sabiá family; falls back to a curated list |
| Ollama | Local models via /v1/models or legacy /api/tags |
| Llama.cpp | Local inference via HTTP server on port 8080 |
Provider auto-detection: when you call /model --<name>, Koda infers the provider from the model name string (e.g. claude → Anthropic, gemini → Google, grok → xAI).
- Node.js 20 or higher
- Git in your
PATH - An API key from any supported provider
git clone https://github.com/antojunimaia-ui/Koda.git
cd Koda
npm install
npm run dev:cleanUse
dev:cleanto wipedist-electron/before starting. Stale build artifacts cause subtle IPC failures.
API keys are stored in localStorage via the settings panel (⚙️ in the TitleBar). No .env file is required for the UI — but you can use one for CLI/dev overrides.
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514
MAX_TOKENS=8192
TEMPERATURE=0.3Koda looks for .env in the current working directory, ~/.koda/.env, and the build root — in that order.
Type these directly in the chat input:
| Command | Description |
|---|---|
/help |
Shows the command reference |
/clear or /reset |
Clears conversation history and agent memory |
/tokens or /cost |
Displays estimated token usage for the current context |
/model --<name> |
Switches the active model |
/apikey <key> |
Sets the API key inline |
/<skill-name> [message] |
Activates a skill and optionally sends a task in the same message |
Koda includes a built-in HTTP server for remote task execution. Enable it in Settings → Remote Control.
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/status |
Public | Agent status, busy state, current project and model |
POST |
/task |
Token | Send a task — body: { "message": "..." } |
POST |
/reset |
Token | Reset conversation remotely |
GET |
/messages |
Token | Retrieve full conversation history as JSON |
The server listens on 0.0.0.0:3141 (configurable). Pair with Tailscale to access it securely from any device on your private network — GitHub Actions, bots, scripts, other agents.
curl -X POST http://100.x.x.x:3141/task \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"message": "run the test suite and fix any failures"}'Tasks sent via the API appear in chat with a 🌐 Remote badge.
# Windows — NSIS installer (.exe)
npm run dist
# Linux — AppImage
npm run dist:linux
# macOS — DMG
npm run dist:macOutput goes to release-build/. node-pty and operantid.js are unpacked from the ASAR archive since they require native binaries.
src/
├── main/ # Electron Main Process (Node.js)
│ ├── index.ts # App bootstrap + all IPC handlers
│ ├── core/
│ │ ├── agent.ts # Agent class: provider lifecycle, message loop, tool orchestration
│ │ ├── conversation.ts # Message history, microCompact, trimIfNeeded, rollback
│ │ ├── prompt-builder.ts # Dynamic system prompt assembly (env + project + tools)
│ │ └── context.ts # Project detection (language, framework, package manager)
│ ├── providers/ # 13 LLM provider implementations (all extend BaseProvider)
│ │ └── base.ts # BaseProvider, Message, StreamChunk, ToolCall interfaces
│ ├── tools/ # 18 agent tools (all extend BaseTool)
│ │ ├── index.ts # ToolRegistry: registration, mode filtering, plan-mode lock, format adapters
│ │ ├── shell.ts # ShellTool + PTY registry + KillPty/ListPty/ShellInput/ShellWait
│ │ ├── file-edit.ts # String-replace edit with unified diff output
│ │ ├── collaborate.ts # Advisor LLM session (StartColab/SendColab/EndColab)
│ │ ├── plan.ts # Plan mode state machine + approval Promise
│ │ └── mcp-tool.ts # Dynamic MCP tool wrapper
│ ├── services/
│ │ ├── snapshot.ts # In-memory workspace snapshots (create/restore/list)
│ │ ├── session-manager.ts # Disk-persisted project sessions (userData/sessions/<md5>.json)
│ │ ├── mcp-manager.ts # MCP server lifecycle + JSON-RPC tool discovery + callTool
│ │ ├── lsp-client.ts # typescript-language-server client (hover, goToDefinition)
│ │ ├── file-tracker.ts # In-session file access tracker (read/modified)
│ │ ├── skill-manager.ts # Loads .md skills from ~/.koda/skills/ and .koda/skills/
│ │ └── webhook-server.ts # HTTP remote control server (0.0.0.0, token auth)
│ ├── config/
│ │ └── settings.ts # AppSettings, .env loading, provider defaults
│ └── utils/
│ ├── diff.ts # Unified diff generation + string-replace logic
│ ├── tokens.ts # Token estimation (~4 chars/token heuristic)
│ ├── syntax.ts # Language detection from file extension
│ └── logger.ts # Logging utilities
├── preload/
│ └── index.ts # contextBridge: exposes window.koda API to renderer
└── renderer/ # React 19 + Tailwind CSS 4
├── App.tsx # Main UI: chat, virtualized message list, slash commands, @ mentions
├── components/
│ ├── TitleBar.tsx # Mode switcher (Fast/Planner/Colab) + panel toggles + window controls
│ ├── TerminalPanel.tsx # xterm.js terminal connected to a live PTY
│ ├── BrowserPreview.tsx # Electron <webview> browser panel with navigation controls
│ ├── MCPSettings.tsx # MCP server configuration UI (add/edit/delete/enable)
│ └── BrailleSpinner.tsx # Animated thinking indicator
└── themes/ # JSON theme definitions (Tokyo Night, GitHub Dark, Cyberpunk, Monokai)
User sends message
→ IPC: renderer → main
→ createSnapshot(messageId, conversationLength) // full workspace captured in memory
→ agent.processMessage()
→ conversation.addUser(message, images?)
→ loop:
→ conversation.trimIfNeeded() // microCompact + token-limit trim
→ provider.chat(messages, tools) // streaming
→ StreamChunk: text → onText() → IPC → UI
→ StreamChunk: tool_call_start → onToolStart() → IPC → UI
→ StreamChunk: tool_call_end → pendingToolCalls[]
→ conversation.addAssistant(text, toolCalls)
→ for each toolCall:
→ toolRegistry.execute(name, args) // mode check + plan-mode lock
→ onToolEnd(name, result) → IPC → UI
→ conversation.addToolResult(id, output)
→ if no tool calls: break
Conversation maintains a messages[] array with a 100k token soft limit (estimated at ~4 chars/token). Before trimming, microCompact scans old tool results for compactable tools (file reads, shell output, etc.) and replaces their content with a placeholder, preserving the structural history. If still over the limit, old messages are dropped and replaced with a summary notice, always keeping the system message and the last 10 turns.
Snapshots are stored in a Map<messageId, WorkspaceSnapshot> in the main process memory. Each snapshot captures all non-ignored text files under 2MB. On rollback, files are restored from the snapshot and the agent's conversation.messages array is truncated to the saved length — keeping files and memory perfectly in sync. Snapshots are cleared forward from the restored point to prevent branching inconsistencies.
Read CONTRIBUTING.md before opening PRs. Key points: strict TypeScript (avoid any), focused PRs (one thing per PR), Conventional Commits, and never commit API keys.
Distributed under the BSD 3-Clause License.