Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 6 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Made for biomedical data, Agentomics outperformed human experts and created new

How it works
1) Input is a CSV training dataset + optional data description
2) Agentomics autonomously experments with various ML models and strategies
2) Agentomics autonomously experiments with various ML models and strategies
3) Output is a trained model ready for inference and a detailed PDF report summarizing the development process and achieved metrics

For more details see: [preprint](https://www.biorxiv.org/content/10.64898/2026.01.27.702049v1)
Expand All @@ -25,7 +25,7 @@ For more details see: [preprint](https://www.biorxiv.org/content/10.64898/2026.0
git clone https://github.com/BioGeMT/agentomics-ml.git
cd agentomics-ml
cp .env.example .env
# Edit .env and set at least one API key (OPENROUTER_API_KEY or OPENAI_API_KEY)
# Edit .env and set at least one provider key, such as OPENROUTER_API_KEY or OPENAI_API_KEY

# Download example dataset
./scripts/download_example_dataset.sh
Expand All @@ -35,7 +35,7 @@ cp .env.example .env

Recommended model: `gpt-5.1-codex-max`

Outputs are saved to `outputs/<agent_id>/`, including PDF reports in `outputs/<agent_id>/pdf_reports`.
Outputs are saved to `outputs/<agent_id>/`, including PDF reports in `outputs/<agent_id>/reports/pdf/`.

### Installation Requirements

Expand All @@ -52,17 +52,16 @@ For more details visit **https://biogemt.github.io/agentomics-ml/**
- Generic: Agentomics can crunch any classification and regression datasets in CSV format.
- Secure: Agents execute code securely in Docker with read-only mounts to your file system and are only allowed to write in a Docker Volume.
- Reproducible: Outputs include models, scripts, and conda environments needed to run inference or re-train models with one bash command.
- Trustworthy: If you provide a test set, Agentomics fully abstracts LLMs from accessing it, allowing you to rely on programmaticly computed and reported test set metrics.
- Foundation models: Agentomics can leverage foundation models from huggingface for both embeddings and fine-tuning.
- Various LLM providers: OpenAI, OpenRouter, or local models via Ollama
- Trustworthy: If you provide a test set, Agentomics keeps it hidden from the LLM and reports programmatically computed test metrics.
- Foundation models: Agentomics can leverage foundation models from Hugging Face for both embeddings and fine-tuning.
- Various LLM providers: OpenAI, Anthropic, OpenRouter, Codex/ChatGPT OAuth, or local models via Ollama.
- Reliability: Thanks to our functional validators, Agentomics creates a working model 100% of the time (when using recommended settings).

## Roadmap
Agentomics is in active development. We welcome any raised Issues and suggestions. You can also [Email Us](mailto:martinekvlastimil95@gmail.com).

Features coming soon:
- Support for any data type (currently only CSV datasets)
- Run forking and continuing
- Better local model support and configuration
- Remote GPU support for GCP

Expand All @@ -80,5 +79,3 @@ bioRxiv (preprint) https://www.biorxiv.org/content/10.64898/2026.01.27.702049v1
## License

MIT. See `LICENSE`.


18 changes: 12 additions & 6 deletions docs/configuration/cli-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,13 @@ Complete reference for `run.sh` command-line options.
| Option | Description | Default |
|--------|-------------|---------|
| `--model <name>` | LLM model to use | Interactive selection |
| `--provider <name>` | Provider to use when multiple providers are configured | Prompted if multiple providers are available in an interactive terminal |
| `--dataset <name>` | Dataset name | Interactive selection |
| `--iterations <n>` | Number of iterations | Prompted in interactive mode (default 5) |
| `--val-metric <metric>` | Validation metric to optimize | Task-based default (`AUROC` for classification, `MAE` for regression) |
| `--task-type <classification\|regression>` | Task type to use while preparing the selected raw dataset | Dataset config or interactive preparation prompt |
| `--timeout <seconds>` | Time limit for entire run | None |
| `--split-timeout <seconds>` | Time limit after which the agent may no longer change train/validation split | None |
| `--run-python-timeout <seconds>` | Timeout in seconds for each run_python tool execution - this will determine the maximum training time | `21600` (6 hours) |

The run stops when either the iteration count is reached or the timeout expires.
Expand All @@ -22,7 +25,9 @@ The run stops when either the iteration count is reached or the timeout expires.
| `--build-images` | Build Docker images locally |
| `--local` | Run without Docker (uses conda) |
| `--cpu-only` | Disable GPU acceleration |
| `--ollama` | Use local Ollama for LLM |
| `--ollama` | Enable Docker host networking for a host Ollama server |
| `--test` | Run the integrated test suite in Docker mode |
| `--all-iterations-test` | Evaluate every archived iteration on the held-out test set after the run |

## Listing Options

Expand All @@ -39,14 +44,14 @@ The run stops when either the iteration count is reached or the timeout expires.
|--------|-------------|
| `--user-prompt <text>` | Custom prompt for the agent |
| `--iteration-plan-model <name>` | LLM model used for generating the iteration plan (defaults to `--model`) |
| `--foundation-model-type <type>` | Pre-download foundation models (`dna`, `rna`, `protein`, `molecule`, `all`) |
| `--foundation-models-type <type>` | Enable foundation models (`dna`, `rna`, `protein`, `molecule`, `all`) |
| `--use-provisioning-key` | Use OpenRouter temporary API key |
| `--spend-limit <n>` | Spend limit for provisioning key (requires `--use-provisioning-key`) |
| `--verbosity <summary\|full>` | How much agent interaction detail is printed during the run (default: `full`) |
| `--disable-training-reporting` | Disable the TrainingReporter helper that emits structured training progress updates from the agent's training script |
| `--split-allowed-iterations <n>` | Iterations that can modify train/val split (default 1) |
| `--exploration-iterations <n>` | Baseline exploration iterations (default 4) |
| `--run-python-timeout <seconds>` | Per-training timeout for `run_python` tool (default 21600) |
| `--tags <tag...>` | Space-separated tags for W&B logging |

## Forking

Expand Down Expand Up @@ -95,7 +100,8 @@ See [Forking a Run](../user-guide/forking.md) for a full guide and examples.
### Using Ollama

```bash
./run.sh --ollama
export OLLAMA_BASE_URL=http://localhost:11434/v1
./run.sh --ollama --provider ollama --model llama3.1 --dataset my_data
```

### CPU Only
Expand All @@ -104,10 +110,10 @@ See [Forking a Run](../user-guide/forking.md) for a full guide and examples.
./run.sh --cpu-only --model openai/gpt-4 --dataset my_data
```

### Pre-download Foundation Models
### Enable Foundation Models

```bash
./run.sh --foundation-model-type protein --model openai/gpt-4
./run.sh --foundation-models-type protein --model openai/gpt-4 --dataset my_data
```

### Run with locally built Docker images
Expand Down
5 changes: 3 additions & 2 deletions docs/configuration/custom-prompts.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Without customization, the agent uses:

## What Custom Prompts Affect

The user prompt influences all agent steps:
The user prompt influences the agentic steps:

| Step | How It's Used |
|------|---------------|
Expand All @@ -61,7 +61,8 @@ The user prompt influences all agent steps:
| Model Architecture | Model selection and design |
| Model Training | Training approach and hyperparameters |
| Model Inference | Prediction pipeline design |
| Validation Evaluation | What success criteria matter most |

Validation evaluation itself is deterministic: it runs the generated inference script on train/validation data and scores the configured `--val-metric`.

## Prompt Tips

Expand Down
5 changes: 5 additions & 0 deletions docs/configuration/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ At least one API key is required:
|----------|----------|---------|
| `OPENROUTER_API_KEY` | OpenRouter | [openrouter.ai](https://openrouter.ai/) |
| `OPENAI_API_KEY` | OpenAI | [platform.openai.com](https://platform.openai.com/) |
| `ANTHROPIC_API_KEY` | Anthropic | [console.anthropic.com](https://console.anthropic.com/) |

The Codex provider uses `codex login` and reads `~/.codex/auth.json`. Ollama does not use an API key, but set `OLLAMA_BASE_URL` so the provider is selectable.

### Provisioning Key (Optional)

Expand Down Expand Up @@ -115,6 +118,7 @@ export CUDA_VISIBLE_DEVICES=0,1
# LLM Provider (choose one or more)
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxx
# OPENAI_API_KEY=sk-xxxxxxxxxxxx
# ANTHROPIC_API_KEY=sk-ant-xxxxxxxxxxxx

# Weights & Biases (optional)
WANDB_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxx
Expand All @@ -128,6 +132,7 @@ WANDB_ENTITY=my-team

# Ollama (optional)
# Configure in src/utils/providers/configured_providers.yaml
# OLLAMA_BASE_URL=http://localhost:11434/v1
```

## Security Notes
Expand Down
31 changes: 24 additions & 7 deletions docs/configuration/providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Agentomics-ML supports multiple LLM providers out of the box.
|----------|---------------------|--------|
| [OpenRouter](https://openrouter.ai/) | `OPENROUTER_API_KEY` | 100+ models |
| [OpenAI](https://openai.com/) | `OPENAI_API_KEY` | Use `--list-models` to see available models |
| [Anthropic](https://anthropic.com/) | `ANTHROPIC_API_KEY` | Claude models available to your account |
| OpenAI Codex | `codex login` | Uses your local Codex/ChatGPT login |
| [Ollama](https://ollama.ai/) | Local setup | Local models |

Expand Down Expand Up @@ -60,6 +61,21 @@ Use `./run.sh --list-models` to see what your API key can access.

---

## Anthropic

Direct access to Anthropic models.

### Setup

```bash
export ANTHROPIC_API_KEY="sk-ant-xxxxxxxxxxxx"
./run.sh --provider anthropic --list-models
```

Use `--provider anthropic` explicitly when other provider keys are also set.

---

## Codex (ChatGPT OAuth)

Experimental support for the local Codex CLI login flow.
Expand All @@ -76,7 +92,7 @@ Then run Agentomics with the `codex` provider:

```bash
./run.sh --provider codex --list-models
./run.sh --provider codex --model gpt-5.4
./run.sh --provider codex --model gpt-5.4 --dataset my_data
```

This provider reads your local Codex auth state from `~/.codex/auth.json` and
Expand All @@ -98,15 +114,16 @@ Run models locally for privacy or offline use.

### Docker Mode (Recommended)

Run with:
Set `OLLAMA_BASE_URL` so Agentomics considers Ollama available, then run with:

```bash
./run.sh --ollama
export OLLAMA_BASE_URL=http://localhost:11434/v1
./run.sh --ollama --provider ollama --model <ollama-model> --dataset <dataset>
```

Docker mode connects to the Ollama base URL defined in
`src/utils/providers/configured_providers.yaml`
(default: `http://host.docker.internal:11434/v1`).
(default: `http://localhost:11434/v1`) and uses host networking when `--ollama` is passed.
Ensure your Ollama server is reachable from the host at `:11434`.

### Local Mode
Expand All @@ -115,7 +132,8 @@ For local mode, set the Ollama base URL in `src/utils/providers/configured_provi
to `http://localhost:11434/v1`, then run:

```bash
./run.sh --local
export OLLAMA_BASE_URL=http://localhost:11434/v1
./run.sh --local --provider ollama --model <ollama-model> --dataset <dataset>
```

### Popular Models
Expand Down Expand Up @@ -197,5 +215,4 @@ Ensure Ollama is running:
ollama list # Should show pulled models
```

For Docker mode, verify that `host.docker.internal:11434` is reachable from
containers (run with `./run.sh --ollama`).
For Docker mode, run with `--ollama` so the container uses host networking, and verify the configured Ollama URL is reachable on the host.
6 changes: 4 additions & 2 deletions docs/developer/gpu-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,12 +122,14 @@ Agentomics-ML supports multi-GPU training:
- Agent-generated scripts may use DataParallel or DistributedDataParallel
- All available GPUs are passed to containers by default

To limit GPUs:
To limit GPUs in local mode:

```bash
CUDA_VISIBLE_DEVICES=0,1 ./run.sh # Use only first 2 GPUs
CUDA_VISIBLE_DEVICES=0,1 ./run.sh --local # Use only first 2 GPUs
```

In Docker mode, Agentomics passes all available GPUs to the container; selecting a subset requires running Docker manually with custom GPU flags.

## Docker GPU Flags

When running containers manually, you can limit GPUs with Docker flags:
Expand Down
18 changes: 10 additions & 8 deletions docs/getting-started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ cd Agentomics-ML
```bash
# Create a .env file (required for Docker mode)
cp .env.example .env
# Edit .env and set at least one API key
# Edit .env and set at least one provider key

# Run with pre-built images
./run.sh
Expand All @@ -47,13 +47,13 @@ The images will be downloaded automatically on first run. All subsequent runs wi
```bash
# Create a .env file (required for Docker mode)
cp .env.example .env
# Edit .env and set at least one API key
# Edit .env and set at least one provider key

# Run while building images locally
./run.sh --build-images
```

On first run, you'll be prompted to build the Docker images. This takes a few minutes but only needs to be done once.
With `--build-images`, the Docker images are built locally before the run starts. This takes a few minutes but only needs to be repeated when dependencies or Dockerfiles change.

---

Expand Down Expand Up @@ -103,23 +103,25 @@ Run with local models using Ollama for privacy or offline use.

### Docker Mode Setup

1. Ensure Ollama listens on the host (e.g., `0.0.0.0:11434`).
2. Run with the `--ollama` flag:
1. Ensure Ollama is running on the host.
2. Make the Ollama provider selectable and choose it explicitly:

```bash
./run.sh --ollama
export OLLAMA_BASE_URL=http://localhost:11434/v1
./run.sh --ollama --provider ollama --model <ollama-model> --dataset <dataset>
```

Docker mode connects to the URL configured in `src/utils/providers/configured_providers.yaml`
(default: `http://host.docker.internal:11434/v1`).
(default: `http://localhost:11434/v1`) and uses host networking when `--ollama` is passed.

### Local Mode Setup

For local mode, set the Ollama base URL in `src/utils/providers/configured_providers.yaml`
to `http://localhost:11434/v1`, then run:

```bash
./run.sh --local
export OLLAMA_BASE_URL=http://localhost:11434/v1
./run.sh --local --provider ollama --model <ollama-model> --dataset <dataset>
```

---
Expand Down
9 changes: 5 additions & 4 deletions docs/getting-started/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Get Agentomics-ML running in under 5 minutes using pre-built Docker images.
## Prerequisites

- [Docker](https://docs.docker.com/get-docker/) installed and running
- An API key from [OpenRouter](https://openrouter.ai/) or [OpenAI](https://platform.openai.com/)
- An API key from a configured provider, such as [OpenRouter](https://openrouter.ai/) or [OpenAI](https://platform.openai.com/)

## Steps

Expand All @@ -22,8 +22,8 @@ Docker mode requires a `.env` file in the repo root.

```bash
cp .env.example .env
# Edit .env and set at least one API key:
# OPENROUTER_API_KEY or OPENAI_API_KEY
# Edit .env and set at least one provider key:
# OPENROUTER_API_KEY, OPENAI_API_KEY, or ANTHROPIC_API_KEY
```

### 3. Run the Agent
Expand All @@ -39,7 +39,8 @@ The agent will prompt you to:
1. **Select a model** - Choose from available LLMs
2. **Select a dataset** - Use your own or download examples
3. **Configure iterations** - How many optimization cycles to run
4. **Choose validation metric** - see `./run.sh --list-metrics`

The validation metric defaults to `AUROC` for classification and `MAE` for regression. To choose one explicitly, pass `--val-metric`; see `./run.sh --list-metrics`.

## Using Your Own Dataset

Expand Down
13 changes: 7 additions & 6 deletions docs/how-it-works/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,12 @@ The agent analyzes the dataset:

### 3. Data Split

Creates or modifies train/validation split:
Creates, reuses, or modifies train/validation split files:

- Stratified splitting for classification
- Considers data distribution
- May adjust split based on previous iterations
- Reuses supplied `validation.csv` when present

**Output:** Captured in structured outputs and iteration reports

Expand Down Expand Up @@ -144,7 +145,7 @@ Each step uses an LLM agent that:
1. Receives context (data info, previous results, and the iteration plan when applicable)
2. Generates a structured output (validated by Pydantic)
3. Validates the output meets requirements
4. Retries if validation fails (up to 10 times)
4. Retries if validation fails (up to 5 times by default)

## Iteration Flow

Expand Down Expand Up @@ -186,11 +187,11 @@ Key architecture parameters in `src/utils/config.py`:

| Parameter | Default | Description |
|-----------|---------|-------------|
| `temperature` | 1.0 | LLM creativity level |
| `temperature` | 0.7 | LLM creativity level |
| `max_steps` | 100 | Max steps per agent |
| `max_validation_retries` | 10 | Output validation retries |
| `llm_response_timeout` | 900s | LLM response timeout |
| `bash_tool_timeout` | 300s | Bash command timeout |
| `max_validation_retries` | 5 | Output validation retries |
| `llm_response_timeout` | 600s | LLM response timeout |
| `bash_tool_timeout` | 180s | Bash command timeout |
| `run_python_tool_timeout` | 21600s | Training timeout (6 hours, configurable via `--run-python-timeout`) |

## Next Steps
Expand Down
2 changes: 1 addition & 1 deletion docs/how-it-works/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Models are evaluated at multiple stages:

| Stage | Data Used | Purpose |
|-------|-----------|---------|
| **Dry Run** | Small sample | Validate inference script works |
| **Dry Run** | Prepared training data without labels | Validate inference script shape and metrics compatibility |
| **Validation** | Validation set | Guide optimization |
| **Train** | Training set | Detect overfitting |
| **Test** | Hidden test set | Final unbiased evaluation |
Expand Down
Loading