Skip to content

Commit 2148f2e

Browse files
committed
docs: polish READMEs, bump to v2.1.1
- clio-agentic-search README: add experimental disclaimer, badges, launcher commands, tables for API/CLI/env vars, remove UV_CACHE_DIR cruft from all commands - Root README: fix FastMCP badge version (2.13+ → 3.0+)
1 parent a75dbca commit 2148f2e

File tree

3 files changed

+76
-69
lines changed

3 files changed

+76
-69
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
[![License: BSD-3-Clause](https://img.shields.io/badge/License-BSD--3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
2121
[![PyPI version](https://img.shields.io/pypi/v/clio-kit.svg)](https://pypi.org/project/clio-kit/)
2222
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
23-
[![FastMCP](https://img.shields.io/badge/FastMCP-2.13%2B-purple)](https://github.com/jlowin/fastmcp)
23+
[![FastMCP](https://img.shields.io/badge/FastMCP-3.0%2B-purple)](https://github.com/jlowin/fastmcp)
2424
[![CI](https://github.com/iowarp/clio-kit/actions/workflows/quality_control.yml/badge.svg)](https://github.com/iowarp/clio-kit/actions/workflows/quality_control.yml)
2525
[![Coverage](https://codecov.io/gh/iowarp/clio-kit/branch/main/graph/badge.svg)](https://codecov.io/gh/iowarp/clio-kit)
2626

clio-agentic-search/README.md

Lines changed: 74 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -1,92 +1,99 @@
11
# clio-agentic-search
22

3-
`clio-agentic-search` is a hybrid retrieval engine for scientific computing corpora. It indexes
4-
documents into namespace-specific backends and supports lexical, vector, graph, metadata, and
5-
scientific-operator retrieval in one pipeline.
6-
7-
## Current scope
8-
9-
- Multi-namespace registry with runtime/auth config bundles.
10-
- Connectors:
11-
- `local_fs` (filesystem + DuckDB persistence)
12-
- `object_s3` (in-memory S3-compatible object store + DuckDB)
13-
- `vector_qdrant` (in-memory vector store)
14-
- `graph_neo4j` (in-memory graph traversal)
15-
- `kv_redis` (in-memory log stream retrieval)
16-
- Scientific retrieval operators:
17-
- numeric range (`unit`, `min`, `max`)
18-
- unit matching (`unit`, optional `value`)
19-
- formula targeting (normalized signatures)
20-
- Background indexing job API with cancellation tokens and per-namespace serialized execution.
21-
- Retry wrappers for connect/index operations with exponential backoff.
22-
- Telemetry:
23-
- tracing (`NoopTracer` by default, OpenTelemetry when enabled)
24-
- Prometheus-style metrics export at `/metrics`
3+
[![License: BSD-3-Clause](https://img.shields.io/badge/License-BSD--3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
4+
[![PyPI version](https://img.shields.io/pypi/v/clio-kit.svg)](https://pypi.org/project/clio-kit/)
5+
[![CI](https://github.com/iowarp/clio-kit/actions/workflows/quality_control.yml/badge.svg)](https://github.com/iowarp/clio-kit/actions/workflows/quality_control.yml)
6+
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue)](https://www.python.org/)
7+
8+
> **Status: Experimental** — API surface and storage format may change between minor releases. Suitable for research and evaluation; not yet recommended for production workloads.
9+
10+
Part of [**CLIO Kit**](https://github.com/iowarp/clio-kit) — the IoWarp platform's tooling layer for AI agents.
11+
12+
---
13+
14+
Hybrid retrieval engine for scientific computing corpora. Indexes documents into namespace-specific backends and supports lexical (BM25), vector, graph, metadata, and scientific-operator retrieval in one pipeline. DuckDB storage, FastAPI server, async job queue, OpenTelemetry tracing, Prometheus metrics.
2515

2616
## Quick start
2717

2818
```bash
29-
UV_CACHE_DIR=.uv-cache uv sync --all-groups
30-
UV_CACHE_DIR=.uv-cache uv run clio --help
31-
UV_CACHE_DIR=.uv-cache uv run clio index --namespace local_fs
32-
UV_CACHE_DIR=.uv-cache uv run clio query --namespace local_fs --q "pressure between 190 and 360 kPa"
33-
UV_CACHE_DIR=.uv-cache uv run uvicorn clio_agentic_search.api.app:app --reload
19+
# Via the CLIO Kit launcher (recommended)
20+
uvx clio-kit search serve # Start the API server
21+
uvx clio-kit search query --namespace local_fs --q "pressure between 190 and 360 kPa"
22+
uvx clio-kit search index --namespace local_fs
23+
uvx clio-kit search list --namespace local_fs
3424
```
3525

36-
## API
26+
### Development mode
3727

38-
- `GET /health`: liveness probe.
39-
- `GET /version`: package version.
40-
- `GET /documents?namespace=<ns>`: list indexed documents and chunk counts.
41-
- `POST /query`: run retrieval and return citations + trace events.
42-
- `POST /jobs/index`: submit async index job (`namespace`, `full_rebuild`).
43-
- `GET /jobs/{job_id}`: fetch job status/result.
44-
- `DELETE /jobs/{job_id}`: request cancellation.
45-
- `GET /metrics`: Prometheus text exposition format.
28+
```bash
29+
cd clio-agentic-search
30+
uv sync --all-extras --dev
31+
uv run clio serve # Start dev server with hot reload
32+
uv run clio query --namespace local_fs --q "pressure > 200 kPa"
33+
uv run clio index --namespace local_fs
34+
```
35+
36+
## Features
37+
38+
- **Multi-namespace registry** with runtime/auth config bundles
39+
- **Connectors**: filesystem + DuckDB (`local_fs`), S3 object store, Qdrant vector store, Neo4j graph, Redis KV log
40+
- **Scientific retrieval operators**: numeric range (`unit`, `min`, `max`), unit matching, formula targeting (normalized signatures)
41+
- **Background indexing** job API with cancellation tokens and per-namespace serialized execution
42+
- **Retry/backoff** wrappers for connect/index operations
43+
- **Telemetry**: OpenTelemetry tracing (opt-in), Prometheus metrics at `/metrics`
44+
45+
## API endpoints
46+
47+
| Method | Path | Description |
48+
|--------|------|-------------|
49+
| `GET` | `/health` | Liveness probe |
50+
| `GET` | `/version` | Package version |
51+
| `GET` | `/documents?namespace=<ns>` | List indexed documents and chunk counts |
52+
| `POST` | `/query` | Run retrieval, return citations + trace events |
53+
| `POST` | `/jobs/index` | Submit async index job |
54+
| `GET` | `/jobs/{job_id}` | Fetch job status/result |
55+
| `DELETE` | `/jobs/{job_id}` | Request cancellation |
56+
| `GET` | `/metrics` | Prometheus text exposition format |
4657

4758
## CLI commands
4859

49-
- `clio query`
50-
- `clio index`
51-
- `clio list`
52-
- `clio seed`
53-
- `clio serve`
60+
| Command | Description |
61+
|---------|-------------|
62+
| `clio query` | Run retrieval queries against a namespace |
63+
| `clio index` | Index documents into a namespace |
64+
| `clio list` | List indexed documents |
65+
| `clio seed` | Seed sample data for testing |
66+
| `clio serve` | Start the FastAPI server |
5467

5568
## Environment variables
5669

57-
- `CLIO_LOCAL_ROOT` (default `.`)
58-
- `CLIO_STORAGE_PATH` (default `.clio-agentic-search.duckdb`)
59-
- `CLIO_CORS_ORIGINS` (default `*`)
60-
- `CLIO_OTEL_ENABLED` (`1`/`true`/`yes` to enable OTel tracer)
61-
- `OTEL_EXPORTER_OTLP_ENDPOINT` (default `http://localhost:4317`)
62-
- `CLIO_ANN_BACKEND` (`exact` default, `hnsw` when `clio-agentic-search[ann]` installed)
63-
- `CLIO_CACHE_SHARDS` (default `16`, vector index shard count)
64-
- `CLIO_VECTOR_WARMUP_ASYNC` (default `1`, background vector index warmup on connect)
65-
- `CLIO_INDEX_DOCUMENT_BATCH_SIZE` (default `32`, batched document bundle writes per index pass)
66-
- `CLIO_LEXICAL_BATCH_SIZE` (default `50000`, lexical posting write batch size)
67-
- `CLIO_LEXICAL_DF_PRUNE_THRESHOLD` (default `0.98`, prune tokens above this chunk-frequency ratio)
68-
- `CLIO_LEXICAL_DF_PRUNE_MIN_CHUNKS` (default `200`, minimum indexed chunks before DF pruning applies)
69-
- `CLIO_LEXICAL_MAX_TOKENS_PER_CHUNK` (default `96`, keep top-frequency tokens per chunk)
70-
- `CLIO_LEXICAL_PRUNE_STOPWORDS` (default `1`, remove built-in stopwords from lexical postings)
71-
- `CLIO_LEXICAL_POSTINGS_COMPRESSION` (`none` default, `gzip` for compressed staging during indexing)
72-
- `CLIO_OBJECT_*`, `CLIO_VECTOR_*`/`CLIO_QDRANT_*`, `CLIO_GRAPH_*`/`CLIO_NEO4J_*`,
73-
`CLIO_KV_*`/`CLIO_REDIS_*` for namespace-specific connector config
70+
| Variable | Default | Description |
71+
|----------|---------|-------------|
72+
| `CLIO_LOCAL_ROOT` | `.` | Root directory for local filesystem connector |
73+
| `CLIO_STORAGE_PATH` | `.clio-agentic-search.duckdb` | DuckDB database path |
74+
| `CLIO_CORS_ORIGINS` | `*` | Allowed CORS origins |
75+
| `CLIO_OTEL_ENABLED` | `false` | Enable OpenTelemetry tracing (`1`/`true`/`yes`) |
76+
| `CLIO_ANN_BACKEND` | `exact` | ANN backend (`hnsw` when `[ann]` extra installed) |
77+
| `CLIO_CACHE_SHARDS` | `16` | Vector index shard count |
78+
| `CLIO_INDEX_DOCUMENT_BATCH_SIZE` | `32` | Documents per index batch |
79+
| `CLIO_LEXICAL_BATCH_SIZE` | `50000` | Lexical posting write batch size |
80+
81+
See source for additional `CLIO_LEXICAL_*`, `CLIO_OBJECT_*`, `CLIO_VECTOR_*`, `CLIO_GRAPH_*`, `CLIO_KV_*` variables.
7482

7583
## Quality checks
7684

7785
```bash
78-
UV_CACHE_DIR=.uv-cache uv run ruff check .
79-
UV_CACHE_DIR=.uv-cache uv run ruff format --check .
80-
UV_CACHE_DIR=.uv-cache uv run mypy src/
81-
UV_CACHE_DIR=.uv-cache uv run pytest --ignore=tests/benchmarks
82-
UV_CACHE_DIR=.uv-cache uv run python -m clio_agentic_search.evals.quality_gate
86+
uv run ruff check .
87+
uv run ruff format --check .
88+
uv run mypy src/
89+
uv run pytest --ignore=tests/benchmarks -v
90+
uv run python -m clio_agentic_search.evals.quality_gate
8391
```
8492

85-
## Benchmark note
93+
## Benchmarks
8694

87-
`tests/benchmarks/test_throughput.py` enforces p95 latency for smaller corpora by default.
88-
For the 10k-chunk p95 assertion, enable hardware-specific enforcement with:
95+
`tests/benchmarks/test_throughput.py` enforces p95 latency for smaller corpora by default. For 10k-chunk SLO enforcement:
8996

9097
```bash
91-
CLIO_ENFORCE_LARGE_SLO=1 UV_CACHE_DIR=.uv-cache uv run pytest tests/benchmarks/ -v --benchmark-disable -k "10000_chunks"
98+
CLIO_ENFORCE_LARGE_SLO=1 uv run pytest tests/benchmarks/ -v --benchmark-disable -k "10000_chunks"
9299
```

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "clio-kit"
7-
version = "2.1.0"
7+
version = "2.1.1"
88
description = "CLIO Kit - MCP Servers, Clients, and Tools for AI Agents"
99
readme = "README.md"
1010
requires-python = ">=3.10"

0 commit comments

Comments
 (0)