This project is a command-line research pipeline that turns a natural-language query and a local evidence corpus into a structured markdown report, critiques that report with a structured JSON evaluation, and optionally revises the draft based on that feedback. Retrieval is fully local (no vector DB required for the baseline); generation, evaluation, and revision use the Groq API (llama-3.3-70b-versatile).
You can run it from the CLI, a Streamlit upload UI (app.py), or a Next.js web app under web/ that calls the pipeline either via a local Next API route or a deployed Python API (e.g. Render + Vercel — see Deploy: Render + Vercel).
A hosted instance is available for trying the web UI against a remote API (same stack as in Deploy: Render + Vercel):
| Link | |
|---|---|
| Web app (Vercel) | autonomous-research-eval-agent.vercel.app |
| API health (Render) | research-eval-agent-api.onrender.com/health |
The API may be slow on the first request after idle time on a free tier (cold start). Replace these URLs in the README if you use different deployment domains.
| Layer | Technologies |
|---|---|
| Language (backend) | Python 3 |
| LLM / API | Groq (groq SDK, HTTP via httpx) — research, evaluation, revision, and grounding agents |
| Validation & schemas | Pydantic v2 (pydantic, pydantic_core) |
| CLI | Typer + Rich |
| Env | python-dotenv (.env at repo root) |
| Documents | pypdf for PDFs; plain text / markdown loaders in src/tools/ |
| Optional UI (Python) | Streamlit (streamlit run app.py) — single-file upload + query |
| Web UI | Next.js 14 (App Router), React 18, TypeScript, Tailwind CSS 3 |
| Web → Python bridge | Local: Next.js route web/app/api/evaluate spawns the repo venv Python and scripts/pipeline_json_stdout.py. Production: FastAPI app render_api/main.py exposes POST /api/evaluate; set NEXT_PUBLIC_API_URL on Vercel to the Render service URL |
| Other Python deps | See requirements.txt (e.g. tenacity, requests; some packages are shared/transitive dependencies) |
Model note: Default Groq chat model is set in src/agents/groq_client.py (MODEL, e.g. llama-3.3-70b-versatile); change it there if you switch models.
Prerequisites: Python 3 with pip (venv recommended). For the web app, Node.js 18+ and npm (or compatible package manager).
The design separates retrieval, research (drafting), evaluation, and revision into explicit stages so each step can be inspected, logged, and extended independently.
In addition to the standard CLI pipeline, the project includes an OpenClaw-oriented runner that presents the workflow as explicit agent roles:
- OpenClaw Research Agent
- OpenClaw Evaluation Agent
- OpenClaw Revision Agent
- OpenClaw Grounding Agent
Run it with:
python src/openclaw_runner.py "Analyze EV adoption trends in Nepal"- Local retrieval — Loads
.txt,.md, and.pdffrom the defaultdata/folder (recursive), or from--file/--data-dir; splits text into paragraph chunks; ranks by keyword overlap with the query; returns the top passages for grounding. PDFs use pypdf; encoding issues and bad PDFs are handled with warnings, not crashes. - Research agent — Produces a structured markdown draft (executive summary, findings, evidence, gaps, conclusion) using only the retrieved chunks.
- Evaluation agent — Returns structured JSON aligned with a Pydantic schema: overall score plus subscores (relevance, completeness, clarity, evidence usage on a 1–10 scale), issues, and suggested fixes. The overall score is normalized to match the mean of the subscores for internal consistency.
- Revision agent — Rewrites the draft using the evaluation JSON when revision runs.
- Conditional revision — If the evaluation score is ≥ 8.0 (configurable threshold in code), the revision step is skipped and the final report is the draft; otherwise the reviser runs.
- Traceable outputs — Each run writes numbered artifacts under
outputs/, including retrieved chunks, draft, evaluation, final report, grounding audit JSON, and a run summary (with optionalgrounding_score).
┌─────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Local │ │ Research │ │ Evaluation │
│ retrieval │────▶│ agent (Groq) │────▶│ agent (Groq) │
│ (.txt/.md/.pdf)│ │ draft MD │ │ JSON + schema │
└─────────────┘ └────────┬────────┘ └────────┬─────────┘
│ │
│ score < 8? │
│ ▼
│ ┌───────────────┐
└──────────────│ Revision │
(skip) │ agent (Groq) │
└───────┬───────┘
▼
Final markdown + grounding audit + summary
src/main.py/src/openclaw_runner.py— CLI entrypoints; callsrc/pipeline.pyfor orchestration.src/pipeline.py— Retrieval → draft → evaluate → conditional revise → grounding audit → numbered outputs.src/agents/— Research, evaluation, revision, and grounding agents; sharedgroq_clientandprompt_loader.src/tools/retrieval_tool.py&document_loaders.py— Load.txt/.md/.pdf, paragraph chunking, keyword scoring.src/schemas/—evaluation_schema,grounding_audit_schema(Pydantic validation).
research-eval-agent/
├── app.py # Streamlit UI (upload + query → pipeline)
├── data/ # Default corpus (.txt, .md, .pdf)
├── outputs/ # Run artifacts (gitignored)
├── render_api/
│ └── main.py # FastAPI server for Render (POST /api/evaluate)
├── scripts/
│ └── pipeline_json_stdout.py # JSON-on-stdout entry for the local Next.js API
├── web/ # Next.js app (research evaluation results UI)
│ ├── app/ # App Router: page + /api/evaluate
│ ├── components/
│ ├── lib/
│ └── package.json
├── src/
│ ├── main.py # CLI
│ ├── openclaw_runner.py # OpenClaw-styled CLI
│ ├── pipeline.py # Shared orchestration (+ upload path for API/Streamlit)
│ ├── agents/
│ ├── prompts/
│ ├── schemas/
│ └── tools/
│ ├── retrieval_tool.py
│ └── document_loaders.py
├── .env
├── requirements.txt
└── README.md
-
Clone the repository and open a terminal at the project root.
-
Create a virtual environment (recommended):
python -m venv .venv
-
Activate the environment (examples):
- Windows (PowerShell):
.venv\Scripts\Activate.ps1 - macOS/Linux:
source .venv/bin/activate
- Windows (PowerShell):
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables (see next section).
From the repo root, with the same Python venv and .env as above:
cd web
npm install
npm run devOpen the URL shown in the terminal (usually http://localhost:3000). The UI uploads a document, sends a query to POST /api/evaluate, and displays evaluation scores, report tabs, and actions.
Path resolution: The API looks for the repository root as the parent of web/ by default. If you run Next from a different layout, set RESEARCH_AGENT_ROOT to the absolute path of this repo so Python can find scripts/pipeline_json_stdout.py and .venv.
From the project root with venv activated:
streamlit run app.pyUpload a .txt, .md, or .pdf, enter a query, and run the same pipeline as the CLI (single-document flow).
| Variable | Where | Required | Description |
|---|---|---|---|
GROQ_API_KEY |
Repo .env, Render |
Yes (for LLM) | API key for Groq. |
ALLOWED_ORIGINS |
Render only | No | Comma-separated origins for CORS (e.g. your Vercel URL). Empty = allow any origin. |
NEXT_PUBLIC_API_URL or NEXT_PUBLIC_API_BASE_URL |
Vercel, web/.env.local |
No | Render service origin, no trailing slash (either name works). If unset, the Next app uses /api/evaluate locally. |
RESEARCH_AGENT_ROOT |
Local Next only | No | Absolute path to this repo when the default parent-of-web/ root is wrong. |
Create a .env file in the project root (same directory as requirements.txt):
GROQ_API_KEY=your_key_hereThe CLI loads .env automatically via python-dotenv.
Split deployment uses the FastAPI app in render_api/main.py on Render and the Next.js app in web/ on Vercel. The browser calls your Render URL directly (NEXT_PUBLIC_API_URL); the Next.js route web/app/api/evaluate is only used when that variable is unset (local development).
Render (Python web service)
- Create a Web Service, connect this repo, root directory = repository root.
- Build command:
pip install -r requirements.txt - Start command:
uvicorn render_api.main:app --host 0.0.0.0 --port $PORT
Do not useuvicorn app:app— there is noapp.pyASGI module; the FastAPI instance isappinsiderender_api/main.py. If deploy logs sayAttribute "app" not found in module "app", fix the start command here and redeploy. - Environment variables (dashboard):
GROQ_API_KEY— required (same as local).ALLOWED_ORIGINS— comma-separated origins allowed for CORS, e.g.https://your-app.vercel.appandhttp://localhost:3000for testing. If omitted, the API allows any origin (convenient for experiments; set explicitly for production).
- Optional: use
render.yamlin the repo as a Blueprint. Health check path:/health.
Vercel (Next.js)
- Import the repo; set Root Directory to
web. - Environment variable:
NEXT_PUBLIC_API_URLorNEXT_PUBLIC_API_BASE_URL= your Render service origin with no trailing slash, e.g.https://research-eval-api.onrender.com. The client calls{origin}/api/evaluate. - Redeploy after changing env vars.
See web/.env.example for the frontend variable. Keep GROQ_API_KEY only on Render (server-side), not in NEXT_PUBLIC_* on Vercel.
Note: Free Render instances may spin down when idle; the first request after sleep can take tens of seconds.
From the project root, pass the research question as positional arguments (words are joined into one query string).
Default corpus (recursive scan of data/ for .txt, .md, .pdf):
python src/main.py What are the main factors in Nepal electric vehicle adoption?Single file (mutually exclusive with --data-dir):
python src/main.py --file path/to/report.pdf Your question hereCustom directory (recursive scan; mutually exclusive with --file):
python src/main.py --data-dir path/to/corpus Your question hereThe pipeline prints five stages ([1/5] … [5/5]) and writes files under outputs/. If the evaluation score is below the revision threshold, the log shows whether the reviser changed the draft. openclaw_runner.py accepts the same --file / --data-dir options.
After a successful run, outputs/ typically contains:
| File | Description |
|---|---|
00_retrieved_chunks.json |
Query plus ranked chunks (filename, text, retrieval score). |
01_draft_report.md |
Markdown draft from the research agent. |
02_evaluation.json |
Structured JSON evaluation (scores, issues, suggested fixes). |
03_final_report.md |
Final report: either the revised draft or a copy of the draft if revision was skipped. |
04_run_summary.json |
Query, chunk counts, unique sources, evaluation score, revision flags, grounding_score (when audit ran), ISO timestamp. |
05_grounding_audit.json |
Grounding audit: supported/unsupported points, notes, grounding_score (1–10). |
Older filenames (draft_report.md, evaluation.json, etc.) are legacy; the numbered convention is what the current orchestrator writes.
- Role separation — Distinct agents with dedicated prompts and responsibilities (research vs. critique vs. edit), rather than a single monolithic prompt.
- Tool-augmented grounding — A retrieval step supplies evidence before generation, reducing unsupported generation relative to a bare LLM call.
- Structured critique — The evaluation agent emits machine-readable JSON (schema-validated), which can drive downstream automation or UI, not only human reading.
- Closed-loop improvement — The revision agent consumes evaluation feedback to refine the draft when scores indicate room for improvement.
- Governed autonomy — Conditional revision implements a policy: high-quality drafts skip an extra model call; lower scores trigger deliberate revision.
Together, these properties match a practical notion of agentic systems: modular actors, explicit state (artifacts), tools, and policy-controlled loops.
- Better retrieval — Embeddings, hybrid search, or citation-span extraction for finer-grained evidence.
- Configurable thresholds and models — CLI flags or config file for revision cutoff,
top_k, temperature, and model name. - Tests — Unit tests for retrieval scoring and evaluation JSON validation; integration tests with mocked LLM responses.
- Observability — Structured logging, optional export of token usage, and run IDs for comparing experiments.
- More formats — HTML or DOCX ingestion with preprocessing while keeping the same agent contract.
Portfolio note: This README describes the intended behavior of the repository as implemented in src/; extend the “Future Improvements” section as you ship features.