Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ This is a Model Context Protocol (MCP) server that provides AI clients with acce
### Core Components

- **`codealive_mcp_server.py`**: Main entry point — bootstraps logging, tracing, registers tools and middleware
- **Eight tools**: `get_data_sources`, `semantic_search`, `grep_search`, `fetch_artifacts`, `get_artifact_relationships`, `chat`, `codebase_search`, `codebase_consultant`
- **Eleven tools**: `get_data_sources`, `semantic_search`, `grep_search`, `get_repository_ontology`, `get_file_tree`, `read_file`, `fetch_artifacts`, `get_artifact_relationships`, `get_artifact_query_schema`, `query_artifact_metadata`, `chat`
- **`core/client.py`**: `CodeAliveContext` dataclass + `codealive_lifespan` (httpx.AsyncClient lifecycle, `_server_ready` flag)
- **`core/logging.py`**: loguru structured JSON logging + PII masking + OTel context injection
- **`core/observability.py`**: OpenTelemetry TracerProvider setup with OTLP export
Expand All @@ -106,7 +106,7 @@ This is a Model Context Protocol (MCP) server that provides AI clients with acce
1. **FastMCP Framework**: Uses FastMCP 3.x with lifespan context, middleware hooks, and built-in `Client` for testing
2. **HTTP Auth via `get_http_headers`**: FastMCP 3.x strips the `authorization` header by default (to prevent accidental credential forwarding to downstream services). Our `get_api_key_from_context()` in `core/client.py` must use `get_http_headers(include={"authorization"})` to read Bearer tokens from HTTP/streamable-http clients. **Do not remove the `include=` parameter** — without it, all HTTP-transport clients (LibreChat, n8n, etc.) will fail with a misleading STDIO-mode error.
3. **HTTP Client Management**: Single persistent `httpx.AsyncClient` with connection pooling, created in lifespan
3. **Streaming Support**: `chat` and the deprecated `codebase_consultant` alias use SSE streaming (`response.aiter_lines()`) for chat completions
3. **Tool API v3 Backend Contract**: every MCP tool delegates to `POST /api/tools/{name}` and requests `output_format=agentic`
4. **Environment Configuration**: Supports both .env files and command-line arguments with precedence
5. **Error Handling**: Centralized in `utils/errors.py` — all tools use `handle_api_error()` with `method=` prefix
6. **N8N Middleware**: Strips extra parameters (sessionId, action, chatInput, toolCallId) from n8n tool calls before validation
Expand Down Expand Up @@ -158,7 +158,7 @@ This project uses **loguru** for structured JSON logging. All logs go to **stder

2. **All logs go to stderr.** The stdio MCP transport uses stdout for protocol messages. Any stray `print()` or stdout write will corrupt the MCP protocol and break the client. If you add a new log sink, it must target `sys.stderr`.

3. **Never call `response.text` without a debug guard.** `log_api_response()` is protected by `_is_debug_enabled()` because reading `response.text` consumes the response body. The `chat` tool and deprecated `codebase_consultant` alias stream SSE via `response.aiter_lines()` — calling `.text` first would silently consume the stream and produce empty results. If you add new response logging, always check `_is_debug_enabled()` first:
3. **Never call `response.text` without a debug guard.** `log_api_response()` is protected by `_is_debug_enabled()` because reading `response.text` consumes the response body. If you add new response logging, always check `_is_debug_enabled()` first:
```python
if not _is_debug_enabled():
return # Do NOT touch response body at INFO level
Expand Down Expand Up @@ -264,8 +264,10 @@ Tools that return **structured metadata** (identifiers, match counts, line
numbers, relationship groups, data source listings) return a `dict` (or list of
dicts). FastMCP serializes it automatically via `pydantic_core.to_json`, which
preserves Unicode — no manual `json.dumps()` needed. Examples:
`semantic_search`, `grep_search`, `codebase_search`, `get_data_sources`,
`get_artifact_relationships`.
`semantic_search`, `grep_search`, `get_data_sources`,
`get_repository_ontology`, `get_file_tree`, `read_file`,
`get_artifact_relationships`, `get_artifact_query_schema`, and
`query_artifact_metadata`.

**Never call `json.dumps(...)` from a tool's return path.** Python's `json.dumps`
defaults to `ensure_ascii=True` and escapes Cyrillic/CJK/etc. to `\uXXXX`.
Expand All @@ -289,7 +291,7 @@ description alone — descriptions are not always re-read mid-conversation, but
the response is always in front of the model when it decides what to do next.

Examples in this repo:
- `codebase_search` returns a `hint` field telling the agent that `description`
- `semantic_search` and `grep_search` return a `hint` field telling the agent that `description`
is a triage pointer only and that real understanding must come from
`fetch_artifacts(identifier)` or a local `Read(path)`. Implementation:
`_SEARCH_HINT` in `src/utils/response_transformer.py`.
Expand Down Expand Up @@ -352,7 +354,7 @@ Key points:
- Custom lifespan yields a real `CodeAliveContext` with a mock-backed httpx client
- `monkeypatch.setenv("CODEALIVE_API_KEY", ...)` for `get_api_key_from_context` fallback
- Use `raise_on_error=False` when testing error paths, then assert on `result.content[0].text`
- For SSE streaming (`chat` / `codebase_consultant`), return `httpx.Response(200, text=sse_body)` — `aiter_lines()` works on buffered responses
- For chat-style buffered responses, return `httpx.Response(200, json=payload)` and assert against the Tool API v3 envelope content

### Unit Test Patterns

Expand Down
25 changes: 16 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,14 @@ Once connected, you'll have access to these powerful tools:
1. **`get_data_sources`** - List your indexed repositories and workspaces
2. **`semantic_search`** - Canonical semantic search across indexed artifacts
3. **`grep_search`** - Exact literal or regex text search inside file content, plus literal file-name/path matching (returns files like `Form.xml` even when their content never mentions the name), with line-level previews for content matches
4. **`fetch_artifacts`** - Load the full source for relevant search hits (missing or inaccessible identifiers are reported back in a `<not_found>` block, not silently dropped)
5. **`get_artifact_relationships`** - Expand call graph, inheritance, and reference relationships for one artifact
6. **`chat`** - Slower synthesized codebase Q&A, typically only after search
7. **`codebase_search`** - Deprecated legacy semantic search alias kept for backward compatibility
8. **`codebase_consultant`** - Deprecated alias for `chat`
4. **`get_repository_ontology`** - Get repository-level orientation for one selected repository
5. **`get_file_tree`** - Inspect a bounded file tree for one repository
6. **`read_file`** - Read a repository-relative file path, optionally with a line range
7. **`fetch_artifacts`** - Load the full source for relevant search hits (missing or inaccessible identifiers are reported back, not silently dropped)
8. **`get_artifact_relationships`** - Expand call graph, inheritance, and reference relationships for one artifact
9. **`get_artifact_query_schema`** - Inspect supported ArtifactQuery entities, fields, and examples
10. **`query_artifact_metadata`** - Run read-only metadata analytics across selected repositories
11. **`chat`** - Stateless, slower synthesized codebase Q&A; call only when explicitly requested

## 🎯 Usage Examples

Expand All @@ -43,7 +46,7 @@ After setup, try these commands with your AI assistant:
- *"Find the exact regex that matches JWT tokens"* → Uses `grep_search`
- *"Explain how the payment flow works in this codebase"* → Usually starts with `semantic_search`/`grep_search`, then optionally uses `chat`

`semantic_search` and `grep_search` should be the default tools for most agents. `chat` is a slower synthesis fallback, can take up to 30 seconds, and is usually unnecessary when an agent can run a multi-step workflow with search, fetch, relationships, and local file reads. If your agent supports subagents, the highest-confidence path is to delegate a focused subagent that orchestrates `semantic_search` and `grep_search` first.
`semantic_search` and `grep_search` should be the default tools for most agents. `chat` is a slower stateless synthesis fallback, can take up to 30 seconds, and is usually unnecessary when an agent can run a multi-step workflow with ontology, search, fetch/read, relationships, ArtifactQuery, and local file reads. If your agent supports subagents, the highest-confidence path is to delegate a focused subagent that orchestrates `semantic_search` and `grep_search` first.

## 📚 Agent Skill

Expand Down Expand Up @@ -840,10 +843,14 @@ See [JetBrains MCP Documentation](https://www.jetbrains.com/help/ai-assistant/mc
- `get_data_sources` - List available repositories
- `semantic_search` - Search code semantically
- `grep_search` - Search by exact text or regex
- `get_repository_ontology` - Orient around one repository
- `get_file_tree` - Inspect repository files
- `read_file` - Read one repository-relative file
- `fetch_artifacts` - Fetch source for search result identifiers
- `get_artifact_relationships` - Expand relationships for one artifact
- `chat` - Slower synthesized codebase Q&A, usually after search
- `codebase_search` - Legacy semantic search alias
- `codebase_consultant` - Deprecated alias for `chat`
- `get_artifact_query_schema` - Inspect metadata query schema
- `query_artifact_metadata` - Run metadata analytics
- `chat` - Stateless synthesized codebase Q&A, only when explicitly requested

**Example Workflow:**
```
Expand Down
17 changes: 0 additions & 17 deletions integration_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -556,23 +556,6 @@ async def test_agent_workflow(s: ClientSession, target: str) -> None:
len(text) > 100 and not r.isError,
f"len={len(text)}")

# 5. deprecated aliases
r = await s.call_tool("codebase_consultant", {
"question": "What testing patterns are used?",
"data_sources": [target],
})
record("workflow: codebase_consultant (deprecated)",
len(r.content[0].text) > 50 and not r.isError,
f"len={len(r.content[0].text)}")

r = await s.call_tool("codebase_search", {
"query": "error handling",
"data_sources": [target],
})
record("workflow: codebase_search (deprecated)",
not r.isError,
f"len={len(r.content[0].text)}")


# ── Main ─────────────────────────────────────────────────────────────────────

Expand Down
30 changes: 21 additions & 9 deletions manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"manifest_version": "0.4",
"name": "codealive-mcp",
"display_name": "CodeAlive",
"version": "2.0.4",
"version": "3.0.0",
"description": "Semantic code search and codebase Q&A for Claude Desktop using your CodeAlive account or self-hosted deployment.",
"long_description": "CodeAlive gives Claude Desktop access to semantic code search, artifact fetch, repository discovery, and architecture-aware codebase Q&A. This extension runs locally via MCP and supports both CodeAlive Cloud and self-hosted deployments.",
"author": {
Expand Down Expand Up @@ -53,10 +53,6 @@
"name": "get_data_sources",
"description": "List indexed repositories and workspaces that are ready for search and chat."
},
{
"name": "codebase_search",
"description": "Deprecated legacy semantic search tool kept for backward compatibility."
},
{
"name": "semantic_search",
"description": "Default discovery tool — search by meaning to find code by concepts, behavior, or architecture."
Expand All @@ -70,16 +66,32 @@
"description": "Synthesized codebase Q&A. Do NOT call unless the user explicitly names this tool (e.g. 'use chat'). 'Ask CodeAlive' means use search tools, not chat. Slow (up to 30 seconds)."
},
{
"name": "fetch_artifacts",
"description": "Fetch full source for specific search results when you need the underlying code."
"name": "get_repository_ontology",
"description": "Get repository-level ontology and orientation for a single selected repository."
},
{
"name": "get_file_tree",
"description": "List repository files and folders for a single selected repository."
},
{
"name": "read_file",
"description": "Read a repository-relative file path, optionally bounded by line range."
},
{
"name": "codebase_consultant",
"description": "Deprecated alias for chat kept for backward compatibility."
"name": "fetch_artifacts",
"description": "Fetch full source for specific search results when you need the underlying code."
},
{
"name": "get_artifact_relationships",
"description": "Inspect relationships between artifacts returned by CodeAlive search."
},
{
"name": "get_artifact_query_schema",
"description": "Return supported ArtifactQuery entities, fields, operators, and examples."
},
{
"name": "query_artifact_metadata",
"description": "Run read-only ArtifactQuery metadata analytics across selected repositories."
}
],
"user_config": {
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ packages = ["src"]
package-dir = {"" = "."}

[tool.setuptools_scm]
fallback_version = "2.0.4"
fallback_version = "3.0.0"

[tool.uv]
# Relative dates in exclude-newer (e.g. "7 days") require uv ≥ 0.11.
Expand Down
26 changes: 19 additions & 7 deletions server.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"$schema": "https://static.modelcontextprotocol.io/schemas/2025-12-11/server.schema.json",
"name": "io.github.CodeAlive-AI/codealive-mcp",
"version": "2.0.4",
"version": "3.0.0",
"description": "Semantic code search and analysis from CodeAlive for AI assistants and agents.",
"keywords": [
"context-engineering",
Expand Down Expand Up @@ -54,10 +54,6 @@
"name": "get_data_sources",
"description": "Retrieve all available repositories and workspaces indexed in your CodeAlive account. Use this first to discover what codebases you can search and analyze."
},
{
"name": "codebase_search",
"description": "Deprecated legacy semantic search tool retained for backward compatibility."
},
{
"name": "semantic_search",
"description": "Default discovery tool — search by meaning to find code by concepts, behavior, or architecture."
Expand All @@ -71,8 +67,16 @@
"description": "Synthesized codebase Q&A. Do NOT call unless the user explicitly names this tool (e.g. 'use chat'). 'Ask CodeAlive' means use search tools, not chat. Slow (up to 30 seconds)."
},
{
"name": "codebase_consultant",
"description": "Deprecated alias for chat retained for backward compatibility."
"name": "get_repository_ontology",
"description": "Get repository-level ontology and orientation for a single selected repository."
},
{
"name": "get_file_tree",
"description": "List repository files and folders for a single selected repository."
},
{
"name": "read_file",
"description": "Read a repository-relative file path, optionally bounded by line range."
},
{
"name": "fetch_artifacts",
Expand All @@ -81,6 +85,14 @@
{
"name": "get_artifact_relationships",
"description": "Explore an artifact's relationships — call graph, inheritance hierarchy, or references. Drill down after search or fetch to understand how code connects across the codebase."
},
{
"name": "get_artifact_query_schema",
"description": "Return supported ArtifactQuery entities, fields, operators, and examples."
},
{
"name": "query_artifact_metadata",
"description": "Run read-only ArtifactQuery metadata analytics across selected repositories."
}
]
}
Expand Down
37 changes: 5 additions & 32 deletions smoke_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,12 +135,15 @@ async def test_list_tools(self) -> bool:

expected_tools = {
"chat",
"codebase_consultant",
"codebase_search",
"fetch_artifacts",
"get_artifact_relationships",
"get_artifact_query_schema",
"get_data_sources",
"get_file_tree",
"get_repository_ontology",
"grep_search",
"query_artifact_metadata",
"read_file",
"semantic_search",
}
actual_tools = {tool.name for tool in tools}
Expand Down Expand Up @@ -244,35 +247,6 @@ async def test_chat(self) -> bool:
self.print_error(f"Tool execution failed: {str(e)}")
return False

async def test_codebase_consultant(self) -> bool:
"""Test the codebase_consultant tool (deprecated alias)."""
self.print_test("codebase_consultant Tool (deprecated)")
try:
result = await self.session.call_tool("codebase_consultant", {
"question": "test question",
"data_sources": ["test-repo"]
})

if result.isError:
# Error is expected if no valid API key
error_str = str(result.content)
if "API key" in error_str or "data source" in error_str or "authorization" in error_str.lower():
self.print_success("Tool responds correctly (API key/data source required)")
self.print_info("This is expected in smoke test without valid API key")
return True
else:
self.print_error(f"Unexpected error: {result.content}")
return False

# If we have a valid API key and data source, check response
self.print_success("Tool executed successfully")
self.print_info(f"Response: {str(result.content)[:100]}...")
return True

except Exception as e:
self.print_error(f"Tool execution failed: {str(e)}")
return False

async def test_parameter_validation(self) -> bool:
"""Test that tools validate parameters correctly."""
self.print_test("Parameter Validation")
Expand Down Expand Up @@ -316,7 +290,6 @@ async def run_all_tests(self):
await self.test_get_data_sources()
await self.test_semantic_search()
await self.test_chat()
await self.test_codebase_consultant()
await self.test_parameter_validation()

except Exception as e:
Expand Down
Loading
Loading