Engram

AI-Driven Algorithm Design for Computer Systems

To appear at ACM CAIS 2026.

Engram is an agentic researcher architecture that autonomously designs algorithms for computer systems. Given a problem description and a simulation environment, Engram's agents iteratively write code, run experiments, analyze results, and refine their approach -- producing interpretable, human-expert-level algorithms.

Engram addresses two critical failure modes of prior approaches:

🔄 Evolutionary Neighborhood Bias -- Code-evolution systems propose variants and select them using scalar benchmark scores. This works for incremental refinements, but fails when improvements require coordinated multi-step changes: reformulating the problem, adding tractable relaxations, or accepting temporary regressions to reach a different algorithmic family.
⏳ Coherence Ceiling -- A single long-running agent context eventually suffers from degradation and "context rot," where attention becomes uneven. Meanwhile, independent parallel runs don't accumulate knowledge -- each must rediscover the same modeling insights from scratch.

Engram overcomes both by organizing exploration into a sequence of agents that accumulate knowledge through a persistent Archive and Research Digest.

How It Works

Engram runs a sequence of agents on a systems problem. Each agent:

Reads the task description, prior Research Digest, and initial/best code
Implements candidate algorithms
Experiments by running simulations and collecting results
Analyzes outcomes and iterates on the approach

When an agent exhausts its context or reaches a timeout, its workspace (code snapshots, experiment logs, results) is archived. A Research Digest -- a compact summary of high-level modeling insights -- is distilled and passed to the next agent. The next agent begins with a fresh context window, reading the Digest to build on prior discoveries rather than starting from scratch.

Engram is implemented using the DeepAgents library in LangChain. Each agent has access to DeepAgents tools (filesystem read/write, shell), plus a run_simulation tool that evaluates candidate algorithms against the problem's simulation environment.

Quick Start

Prerequisites

Python 3.11+
Docker (required -- agents execute code in Docker containers)
An OpenAI API key

Installation

git clone --recurse-submodules https://github.com/mit-nms/Engram.git
cd Engram
# If you cloned without --recurse-submodules, run:
#   git submodule update --init --recursive

# Create environment
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -r SystemBench/vidur/requirement.txt

# OpenEvolve baseline (only needed to run openevolve_example_usage.py)
pip install -e Architect/openevolve/

# Set your API key
export OPENAI_API_KEY="your-key-here"

# Verify Docker is running (agents execute their shell commands inside a container)
docker ps

Docker installation (click to expand)

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo systemctl start docker && sudo systemctl enable docker
sudo usermod -aG docker $USER
# Log out and back in, then verify:
docker ps

Run Your First Experiment

Start with a quick smoke test to verify your install:

bash scripts/smoke_test.sh

If that succeeds, try a small end-to-end run on the Multi-Cloud Multicast problem:

python examples/handoff_example_usage.py \
    --problem_name cloudcast \
    --model o3 \
    --num_runs 1 \
    --max_agents 5 \
    --agent_timeout 30

Each run produces JSON logs with generated code, scores, and full reasoning traces under results/. See python examples/handoff_example_usage.py --help for all flags — notably --num_runs, --max_agents, and --agent_timeout control how long a run takes.

Benchmark Problems

Engram is evaluated on three core problems in the paper:

	Problem	Domain	Task
☁️	Multi-Cloud Multicast	Networking	Design a data transfer routing algorithm that optimizes cost and transfer time across cloud providers.
🤖	LLM Request Routing	ML Systems	Design a global request scheduler for distributing LLM inference requests across GPU replicas.
🔍	KV Cache Reuse	Databases	Optimize KV cache reuse strategy for databases with natural language SQL queries.

python examples/handoff_example_usage.py --problem_name cloudcast    # Multi-Cloud Multicast
python examples/handoff_example_usage.py --problem_name vidur        # LLM Request Routing
python examples/handoff_example_usage.py --problem_name llm_sql      # KV Cache Reuse

Beyond the core problems, the repository supports additional benchmarks:

ADRS -- scheduling, load balancing, telemetry repair, and more. See the ADRS README and the upstream ADRS project.
FrontierCS -- 100+ problem instances across systems, ML, security, and scientific computing. See the FrontierCS README and the FrontierCS project.

See the SystemBench README for the full list of available problems.

Other Methods

This repository includes baseline and alternative optimization methods for comparison with Engram:

Agentic methods -- single-agent, tree search, curator+subagents
Evolutionary methods -- FunSearch, Evolution of Heuristics, OpenEvolve
Non-iterative baselines -- one-shot

See the examples README for how to run each one.

Adding Your Own Problem

To apply Engram to a new systems problem, you need:

An evaluator that scores candidate algorithms by running simulations
A task prompt that describes the problem to the agent

Step 1: Create an Evaluator

Write an evaluator class inheriting from SystemBench/evaluator.py. See the SystemBench README for the full interface specification and working examples.

Step 2: Write a Task Prompt

Create a task prompt file (e.g., SystemBench/my_problem/deepagents_files/task_prompt.txt) describing:

What the system does
What the agent needs to optimize
Input/output format of the target function
Constraints and domain knowledge

Step 3: Run

Add your problem to an example script (see examples/handoff_example_usage.py for the pattern), or use the CLI:

python -m Architect.main \
    --method agentic_handoff \
    --model o3 \
    --task_prompt_path SystemBench/my_problem/deepagents_files/task_prompt.txt \
    --evaluator_path SystemBench/my_problem \
    --results_dir results/my_problem_run1 \
    --debug

Run python -m Architect.main --help for all available options.

Project Structure

├── Architect/          # Optimization framework and methods
├── SystemBench/        # Benchmark problems and evaluators
├── examples/           # Example scripts for running each method
├── scripts/            # Analysis and plotting utilities
└── docs/               # Paper and research notes

See the Architect README, SystemBench README, and examples README for details.

Experiment Viewer

A web UI for browsing results from Tree, Handoff, and OpenEvolve runs — no aggregated_results.json required.

pip install flask plotly  # one-time
python scripts/experiment_viewer/server.py
# open http://localhost:5005

In the UI, paste one or more result directories (one per line) and click Set & Refresh. The viewer auto-discovers runs, plots score evolution with baselines, and shows the best solution code and logs. Config is saved to scripts/experiment_viewer/config.json.

Citation

If you use Engram in your work, please consider citing our papers:

Pantea Karimi*, Kimia Noorbakhsh*, Mohammad Alizadeh, and Hari Balakrishnan. "Improving Coherence and Persistence in Agentic AI for System Optimization." arXiv preprint arXiv:2603.21321, 2026.

Pouya Hamadanian*, Pantea Karimi*, Arash Nasr-Esfahany*, Kimia Noorbakhsh*, Joseph Chandler, Ali ParandehGheibi, Mohammad Alizadeh, and Hari Balakrishnan. "Glia: A Human-Inspired AI for Automated Systems Design and Optimization." arXiv preprint arXiv:2510.27176, 2025.

* Equal contribution

Contact

For questions or feedback, reach out to Kimia Noorbakhsh or Pantea Karimi.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Architect		Architect
SystemBench		SystemBench
docs/figures		docs/figures
examples		examples
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
ARTIFACT.md		ARTIFACT.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Engram

How It Works

Quick Start

Prerequisites

Installation

Run Your First Experiment

Benchmark Problems

Other Methods

Adding Your Own Problem

Step 1: Create an Evaluator

Step 2: Write a Task Prompt

Step 3: Run

Project Structure

Experiment Viewer

Citation

Contact

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Engram

How It Works

Quick Start

Prerequisites

Installation

Run Your First Experiment

Benchmark Problems

Other Methods

Adding Your Own Problem

Step 1: Create an Evaluator

Step 2: Write a Task Prompt

Step 3: Run

Project Structure

Experiment Viewer

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages