This playbook is for agentic contributors working in this repository. Follow these rules unless a task explicitly overrides them. Keep edits in ASCII.
- Scope: entire repository (root +
examples/*). - Runtime: Python 3.12, uv-managed
.venv(do not swap toolchains). - Workspace: uv workspace members include
examples/*(demo apps and notebooks). - Platform: macOS; SurrealDB used for examples; keep commands portable.
- No Cursor rules (
.cursor/rules,.cursorrules) or Copilot instructions found.
- Create/activate env with uv:
uv sync --all-packages(oruv syncinside the venv). - If env already present, prefer
uv run ...to ensure deps resolve via uv. - Lockfile:
uv.lockis source of truth; avoid hand-editing. - Additional sources:
pyproject.tomlpins workspace extras (logfire from Git, surrealfs-py via Git in knowledge-graph example).
- Format imports:
uv run ruff check --fix --select I. - Format code:
uv run ruff format(80 cols, 4-space indent, double quotes, trailing commas on multi-line). - Lint:
uv run ruff check(keep clean; prefer fixing warnings rather than ignoring). - Type check all:
uv run basedpyright src/ examples/ --level error. - Package-specific type check:
uv run basedpyright -p examples/<pkg>(e.g.,examples/knowledge-graph). - Tests (all):
uv run pytest. - Tests (single file):
uv run pytest src/kaig/tests/test_db.py(adjust path). - Tests (single test):
uv run pytest src/kaig/tests/test_db.py -k test_name. - Keep tests deterministic; avoid hitting external services. Mock SurrealDB/LLM when possible.
format: runs import sort then formatter.lint: ruff check + basedpyright for root and selected examples.knowledge-graph-db/kg-db: start SurrealDB locally for the knowledge-graph example (surreal start -u root -p root rocksdb:databases/knowledge-graph).knowledge-graph DB/kg DB: run FastAPI ingestion server withDB_NAMEenv, e.g.just kg test_db(wrapsuv run --env-file .env -- fastapi run examples/knowledge-graph/src/knowledge_graph/server.py --port 8080).knowledge-graph-agent DB/kg-agent DB: start chat agent UI, e.g.just kg-agent test_db(wrapsuv run --env-file .env uvicorn knowledge_graph.agent:app --host 127.0.0.1 --port 7932).
src/kaig: core library (DB, embeddings, LLM utilities, definitions, tests undersrc/kaig/tests).docs/: assets and SurrealQL references.examples/knowledge-graph: ingestion + chat example (FastAPI + data-flow executor + PydanticAI agent).examples/demo-graph,examples/demo-simple,examples/demo-ingest-throttled,examples/notebooks: other workspace members; follow their local pyproject settings.Justfile: authoritative task runner; prefer recipes over custom scripts when available.
- Start DB:
just knowledge-graph-db(orsurreal start -u root -p root rocksdb:databases/knowledge-graph). - Run server + ingestion worker:
DB_NAME=test_db just knowledge-graph test_dbor the underlying uv command from README. - Run chat agent:
DB_NAME=test_db just knowledge-graph-agent test_db. - SurrealQL helper queries live in
examples/knowledge-graph/surql/(schema + retrieval). - Flow-based ingestion uses stamp fields; updating handler code changes hashes—watch flow tables to gauge progress.
- To inspect flow status, run the SurrealQL snippet from
examples/knowledge-graph/README.mdunder a Surreal client. - Local tmp directories (
examples/knowledge-graph/tmp, roottmp/) may hold artifacts; do not commit transient outputs.
- Imports: absolute only; no wildcards; group stdlib/third-party/local (ruff handles ordering).
- Prefer
from collections.abc import Iterable, Callable, ...overtyping.Listetc. - Typing: be explicit; avoid
Any; use| Nonefor optional; mark callables precisely; consider Protocols instead of duck-typing comments. - Public interfaces: annotate return types; keep function signatures narrow and clear.
- Dataclasses vs Pydantic: favor Pydantic
BaseModelfor structured data exchange and validation; keep validators minimal and explicit. - Error handling: raise specific exceptions with actionable messages; avoid bare
except; never silentlypass; when useful, log vialogfire. Re-raise withfromto preserve context. - Logging/tracing: prefer structured logs; avoid print; ensure secrets are redacted.
- Formatting: f-strings; double quotes; trailing commas on multiline collections/calls; keep lines ≤ 80 chars.
- Naming: snake_case for variables/functions; PascalCase for classes; UPPER_SNAKE for constants; module filenames snake_case.
- Mutability: prefer immutable data where practical; copy before mutating shared inputs.
- File paths: use
pathlib.Path; avoid hard-coded relative string paths. - Timeouts/retries: set explicit timeouts for network/DB calls; avoid unbounded loops.
- Async: when adding async code, use
asynciobest practices; avoid blocking calls; close sessions/clients. - Database (SurrealDB): parameterize NS/DB/user/pass via env; never hard-code credentials; keep schema changes in
.surqlfiles. - Vector/LLM: prefer dependency injection for models/embedders to keep tests hermetic; mock external calls in tests.
- CLI/servers: guard entrypoints with
if __name__ == "__main__"when adding scripts; ensure uvicorn/fastapi commands reference module paths (not local file assumptions). - Documentation: update README snippets when changing APIs; keep docstrings concise and informative.
- Comments: add only when behavior is non-obvious; keep them up to date.
- Add tests under
src/kaig/testsor relevant example packages; name filestest_*.py. - Structure tests for determinism; seed randomness; isolate filesystem writes under
tmp/or pytesttmp_path. - Avoid networked dependencies; mock SurrealDB/LLM/embedding calls.
- Use pytest markers sparingly; default to unit-style tests; no skipped tests unless necessary.
- When debugging flows, prefer targeted tests with small fixtures rather than full DB spins.
- Do not rewrite existing user changes. Avoid destructive git commands (no hard reset/checkout).
- Only commit when explicitly asked. Keep branches clean; respect repo commit style by inspecting
git logif needed. - Secrets: never commit
.env, credentials, tokens, or SurrealDB passwords. Use env vars and.envlocally only.
- logfire is available (via dev deps). If adding logs, use structured logging and keep PII out.
- Prefer surfacing operational metrics through structured events rather than ad-hoc prints.
- Surreal queries live in
docs/surql-intro.surqlandexamples/knowledge-graph/surql/*. Keep queries in.surqlfiles and load them via helpers (DB.execute/async_execute). - When modifying schema or retrieval queries, keep them idempotent and documented in README sections.
- Knowledge-graph server requires
DB_NAMEenv; it defaults Surreal connection tows://localhost:8000/rpc, userroot, passroot, namespacekaig. - LLM defaults (knowledge-graph): provider
openai, modelgpt-5-mini-2025-08-07, temperature 1; embedder uses OpenAItext-embedding-3-smallwith vector typeF32. - Supply API keys via env (e.g.,
OPENAI_API_KEY); do not hard-code credentials. Use.envlocally and keep it untracked. - Keep Surreal schema definitions in
.surqlfiles; re-run init if tables/edges change. - Avoid clearing DB automatically unless you know it is safe (
db.clear()is commented out by default).
- Flow executor registers handlers via
@exe.flow(...)and writes handler hashes into stamp fields to avoid re-processing. - Current flows:
chunkoperates ondocumentrows lacking aflow_chunkedstamp;infer_conceptsoperates onchunkrows lackingconcepts_inferred. - Flow eligibility: stamp is
NONEand dependencies satisfied; this makes runs restart-safe and incremental. - When updating a flow handler, expect hash changes; use the flow status SurrealQL snippet (README) to see processed vs pending records.
- Background ingestion loop starts with FastAPI server startup (
ingestion_looptask) and stops gracefully on shutdown. - Upload route (
/upload) schedules ingestion via background task; keep handlers async-safe and best-effort idempotent.
- Images and docs live under
docs/assets/; keep large binaries out of git unless essential. - Temp artifacts:
tmp/,examples/knowledge-graph/tmp,databases/,uploads/should stay untracked; clean them before sharing branches. .surqlfiles are canonical for DB schema and queries—do not duplicate queries inline unless necessary for tests.
- Root
pyproject.tomldefines workspace members; use uv workspace semantics rather than ad-hocpip install -e .. - Dev dependencies live under
[dependency-groups].dev; runuv sync --all-packagesto pull workspace extras. - Example packages (e.g.,
knowledge-graph) depend on the root package via workspace source; avoid circular imports across packages.
- Lint + format + type-check before sharing changes: run
just formatthenjust lintor the underlying uv commands. - Confirm tests relevant to your changes pass (single file/test commands above).
- Validate knowledge-graph flows still stamp correctly after handler edits (check flow status query if unsure).
- Ensure new endpoints/CLI entrypoints include parameter validation and do not hard-code secrets or ports.
- Update documentation (README/AGENTS/SurrealQL files) when adjusting APIs, env vars, or schema.
- Clean temp outputs before PRs; ignore transient folders (
tmp/,uploads/,databases/). - If running servers, stop them after tests to avoid port conflicts (8080 for FastAPI, 7932 for agent UI by default).
- When adding new commands, prefer
justrecipes or document the uv invocation clearly.
- Install deps:
uv sync --all-packages - Format imports:
uv run ruff check --fix --select I - Format code:
uv run ruff format - Lint:
uv run ruff check - Type check (all):
uv run basedpyright src/ examples/ --level error - Type check (pkg):
uv run basedpyright -p examples/knowledge-graph - Tests (all):
uv run pytest - Tests (file):
uv run pytest src/kaig/tests/test_db.py - Tests (single):
uv run pytest src/kaig/tests/test_db.py -k test_name - Start KG DB:
just kg-db - Run KG server:
DB_NAME=test_db just kg test_db - Run KG agent:
DB_NAME=test_db just kg-agent test_db