Skip to content

TheGreenCedar/codex-autoresearch

Repository files navigation

Codex Autoresearch

Measured improvement loops for Codex

Try it - Install - How it works - Dashboard - Docs - Changelog

Codex Autoresearch helps Codex turn "make this better" into a measured loop.

Give Codex a goal, a benchmark contract, and a safe edit scope. Codex can run small experiment packets, keep or discard changes with evidence, preserve ASI and metrics across context loss, and package useful work for review.

Codex Autoresearch live dashboard showing a demo runtime improvement

Inspired by the AI-focused karpathy/autoresearch and pi-autoresearch. Codex Autoresearch adapts the measured-loop idea for Codex plugin workflows, repo-local benchmarks, durable session files, an evidence trail, live dashboards, and reviewable finalization.

Try it

Ask Codex to use Codex Autoresearch.

Broad prompts work:

Use $Codex Autoresearch to improve the speed of my indexer's pipeline, while keeping it memory efficient.
Use $Codex Autoresearch to keep reducing bugs in the codebase, starting with
the most obvious low hanging fruits. Keep doing this 100 times.

You can also hand it a sharper investigation:

Use $Codex Autoresearch to figure out why my graphql service's p99 latency is so much higher
than its p90 latency at 1 minute metric resolution. I suspect: DNS lookup, event loop throttling,
memory spike, CPU spike. For each, run the 4-5 appropriate experiments @experiments.md and if the
results are promising keep iterating, otherwise stop and report back.

Or be exact about the benchmark and scope:

Use $Codex Autoresearch to optimize my unit tests' speed. different libraries are allowed, but try to avoid it.
Benchmark: npm test -- --runInBand
Metric: seconds, lower is better
Checks: npm test
Scope: test runner config and test helpers only

Codex should start by checking Git state, identifying the target package, creating or resuming the session, verifying the benchmark, starting the dashboard, running one packet, and logging the result with experiment details.

Install

This repository is a Codex plugin marketplace. Add the marketplace:

codex plugin marketplace add TheGreenCedar/codex-autoresearch

Then open Codex in the repo you want to improve:

/plugins

Choose:

TheGreenCedar Autoresearch -> codex-autoresearch -> Install plugin

Start a new Codex thread after installation.

How it works

A normal session follows this shape:

Target -> Onboard -> Setup -> Doctor -> Dashboard -> Packet -> Log -> Continue or Finalize

Codex Autoresearch helps Codex:

  1. identify the target repo or child package
  2. check for an existing session
  3. return a shared next-step contract with stage, reason, CLI command, safety, and missing essentials
  4. verify the benchmark contract
  5. run a measured packet with command identity, output tails, metrics, artifacts, checks, and a freshness fingerprint
  6. log the result as keep, discard, crash, or checks_failed
  7. preserve ASI, packet fingerprints, promotion labels, and metrics in durable files
  8. continue safely or preview finalization into reviewable branches

A packet is one measured experiment cycle: make a scoped change, run the benchmark, inspect the metric, and log the decision.

ASI means Accumulated Structured Intelligence. It is the structured memory attached to each packet decision: hypothesis, evidence, rollback reason, next action hint, and optional lane, family, or risk metadata. It tells the next Codex session what happened, what was learned, and which path deserves the next attempt.

When to use it

Use Codex Autoresearch when:

  • the goal can be measured
  • the benchmark is repeatable
  • correctness checks exist or can be added
  • the editable scope is small enough to review
  • kept work should become reviewable commits or branches

Use a regular Codex task when:

  • the work needs one careful edit
  • the goal is mainly taste or judgment
  • the benchmark is flaky or very expensive
  • the metric can improve by weakening the benchmark
  • secrets, deployment paths, or unrelated dirty files are in scope

Dashboard

Ask codex to boot up the dashboard if it hasn't already.

The dashboard shows:

  • baseline, latest, best, confidence, and weighted metric formulas
  • Codex brief and session memory
  • next safe action, evidence label, proof gaps, and why the action is safe
  • ledger entries, ASI, and handoff context
  • best kept change and recent failures
  • strategy lanes, scaffold health, research integrity, runtime drift, and finalization readiness
  • copyable status reports and agent handoff packets

Use the dashboard to inspect state. Talk to Codex for everything else.

Quality-gap loops

For product, docs, UX, or broad research, ask for a quality-gap loop:

Use Codex Autoresearch to study this project and improve the dashboard.
Turn accepted findings into a quality-gap loop, implement them, and keep the live dashboard open.

quality_gap=0 means the accepted checklist for that round is closed. It does not mean discovery is complete. Start another round if the question is still alive.

Finalization

Ask the plugin to finalize once a loop has useful kept work mixed with exploratory history.

Finalization should:

  1. select kept evidence
  2. exclude session artifacts from review branches unless requested
  3. block later-discarded, invalidated, or reverted keeps
  4. show dirty-tree, overlap, semantic-safety, and final-tree coverage warnings
  5. prepare clean review branches or a current-final-tree plan
  6. preserve metric evidence and verification commands
  7. leave cleanup until review branches are verified

Docs

The active package lives under:

plugins/codex-autoresearch

The plugin skill lives at:

plugins/codex-autoresearch/skills/codex-autoresearch/SKILL.md

Development

The plugin and dashboard source are written in TypeScript.

The package uses tsdown for Node builds, tsgo for typechecking, oxlint for linting, oxfmt for formatting, Vite for the dashboard, and npm-run-all2 for combined gates.

From plugins/codex-autoresearch:

npm install
npm run check
npm test
node scripts/autoresearch.mjs --help

Targeted checks:

npm run typecheck
npm run lint
npm run format:check
node scripts/autoresearch.mjs doctor --cwd . --check-benchmark --explain
git diff --check

Update or remove

Refresh the marketplace:

codex plugin marketplace upgrade thegreencedar-autoresearch

Remove the marketplace:

codex plugin marketplace remove thegreencedar-autoresearch

To uninstall the plugin, open Codex:

/plugins

Then choose:

codex-autoresearch -> Uninstall plugin

Changelog

User-facing changes are tracked in CHANGELOG.md.

License

This project is licensed under the terms of the Apache License 2.0. Copyright (c) 2026 Albert Najjar.

About

A codex plugin for running optimization loops inside a codebase. It is useful when you have a measurable target and many possible changes to try: test runtime, build speed, bundle size, model loss, Lighthouse scores, memory use, query latency, or any other metric you can print from a script.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors