|
| 1 | +# SWE-agent for SWE-bench Pro |
| 2 | + |
| 3 | +This guide explains how to run SWE-agent on the SWE-bench Pro dataset using Modal. This setup supports both direct command-line execution and a dockerized wrapper. |
| 4 | + |
| 5 | +For details about SWE-agent implementation, please see https://github.com/SWE-agent/SWE-agent. |
| 6 | + |
| 7 | +## Prerequisites |
| 8 | + |
| 9 | +Before getting started, ensure you have the following: |
| 10 | + |
| 11 | +- **Python 3.8+** with pip installed |
| 12 | +- **Docker** (Recommended, for the dockerized wrapper setup) |
| 13 | +- **Just** command runner (if using the dockerized wrapper setup) |
| 14 | +- **Modal account** with access credentials (sign up at https://modal.com) |
| 15 | +- **API access** to a compatible LLM (e.g., OpenAI API, Anthropic Claude API, or hosted model endpoint) |
| 16 | +- **DockerHub username** for generating the instances.yaml file |
| 17 | + |
| 18 | +# Installation |
| 19 | + |
| 20 | +## Install SWE-agent |
| 21 | +```bash |
| 22 | +pip install -e . |
| 23 | +``` |
| 24 | + |
| 25 | +## Apply SWE-Rex Patches |
| 26 | +After installing SWE-agent, you need to apply custom patches to the SWE-Rex installation. These patches modify the installed SWE-Rex package to work properly with SWE-bench Pro. |
| 27 | + |
| 28 | +The `patch.py` script will: |
| 29 | +1. Locate your SWE-Rex installation (in your Python environment's site-packages) |
| 30 | +2. Back up the original files with a `.bak` extension |
| 31 | +3. Copy the patched versions over the installed files |
| 32 | + |
| 33 | +To apply the patches: |
| 34 | + |
| 35 | +```bash |
| 36 | +cd swerex_patches |
| 37 | +python patch.py |
| 38 | +``` |
| 39 | + |
| 40 | +You'll be prompted to confirm each patch. To skip prompts and apply all patches automatically: |
| 41 | + |
| 42 | +```bash |
| 43 | +python patch.py --yes |
| 44 | +``` |
| 45 | + |
| 46 | +**Note**: Currently patches `swerex/deployment/modal.py` to customize Modal deployment behavior for SWE-bench Pro. |
| 47 | + |
| 48 | +# Generate Instances |
| 49 | +Before running SWE-agent, you must first generate the instance YAML file from the SWE-bench Pro dataset. This file contains all the necessary information for each instance including Docker image names, problem statements, and repository details. |
| 50 | + |
| 51 | +Run the generation script: |
| 52 | +```bash |
| 53 | +python helper_code/generate_sweagent_instances.py --dockerhub_username <your-dockerhub-username> |
| 54 | +``` |
| 55 | + |
| 56 | +This will create `SWE-agent/data/instances.yaml` with all instances from the SWE-bench Pro test split. You can customize the output path: |
| 57 | +```bash |
| 58 | +python helper_code/generate_sweagent_instances.py \ |
| 59 | + --dockerhub_username <your-dockerhub-username> \ |
| 60 | + --output_path path/to/custom_instances.yaml |
| 61 | +``` |
| 62 | + |
| 63 | +**Note**: The generated instances.yaml file is what you'll reference in the `--instances.path` parameter when running SWE-agent (see examples below). |
| 64 | + |
| 65 | +# Configure Environment Variables |
| 66 | + |
| 67 | +## For dockerized setup (recommended) |
| 68 | + |
| 69 | +Before running SWE-agent, create a `.env` file in the `SWE-agent/` directory to store your API credentials. These environment variables will be used in your configuration files. |
| 70 | + |
| 71 | +Create `SWE-agent/.env` with the following content: |
| 72 | +``` |
| 73 | +OPENAI_API_KEY=<your-api-key> |
| 74 | +OPENAI_BASE_URL=<your-api-base-url> # Optional, only if using a custom endpoint |
| 75 | +``` |
| 76 | + |
| 77 | +## For non-dockerized setup (not recommended) |
| 78 | + |
| 79 | +**Note**: Despite the variable names, these can be used with any LLM provider: |
| 80 | +- For standard API providers: set `OPENAI_API_KEY` to your API key |
| 81 | +- For custom endpoints: set `OPENAI_BASE_URL` to your API endpoint |
| 82 | +- For hosted models: set both variables to point to your hosted endpoint |
| 83 | + |
| 84 | +These variables are referenced in the config files as `$OPENAI_API_KEY` and `$OPENAI_BASE_URL`. |
| 85 | + |
| 86 | +# Run without dockerized setup (not recommended) |
| 87 | +For batch run, the scripts can take input `json`, `jsonl` or `yaml` files. |
| 88 | + |
| 89 | +``` |
| 90 | +OUTPUT_PATH=xx |
| 91 | +
|
| 92 | +sweagent run-batch \ |
| 93 | + --config config/tool_use.yaml \ |
| 94 | + --output_dir $OUTPUT_PATH \ |
| 95 | + --num_workers 30 \ |
| 96 | + --random_delay_multiplier 1 \ |
| 97 | + --instances.type file \ |
| 98 | + --instances.path data/instances.yaml \ |
| 99 | + --instances.slice :300 \ |
| 100 | + --instances.shuffle=False \ |
| 101 | + --instances.deployment.type=modal \ |
| 102 | + --instances.deployment.startup_timeout 1800 \ |
| 103 | + --instances.deployment.runtime_timeout 3600 \ |
| 104 | + --agent.model.name claude-3-7-sonnet-20250219 \ |
| 105 | + --agent.model.api_base $OPENAI_BASE_URL \ |
| 106 | + --agent.model.api_key $OPENAI_API_KEY |
| 107 | +``` |
| 108 | + |
| 109 | +This will generate the patches, which can then be evaluated use the same scripts we use for evaluating SWEAP. |
| 110 | + |
| 111 | +# Running with Dockerized Wrapper Setup (Recommended) |
| 112 | + |
| 113 | +In order to run swe-agent using a docker container which handles installing all dependencies and patches as well as a single entrypoint script, follow these steps |
| 114 | + |
| 115 | +## Setup Modal |
| 116 | + |
| 117 | +Run the following commands to store modal credentials (if you want to run swe-agent with modal): |
| 118 | +``` |
| 119 | +pip install modal |
| 120 | +modal setup # and follow the prompts to generate your token and secret |
| 121 | +``` |
| 122 | + |
| 123 | +After running these steps, you should be able to see a token ID and secret in `~/.modal.toml`: |
| 124 | +EG: |
| 125 | +``` |
| 126 | +[your-workspace-name] |
| 127 | +token_id = <token id> |
| 128 | +token_secret = <token secret> |
| 129 | +active = true |
| 130 | +``` |
| 131 | + |
| 132 | +**Note**: If you use the dockerized setup, and you do not see ~/modal.toml present, you may have to adjust the location from which it is copied in the justfile |
| 133 | + |
| 134 | + |
| 135 | +## Create a .env file |
| 136 | +In the SWE-agent directory, create an .env file and populate it with your OpenAI API Key: |
| 137 | + |
| 138 | +``` |
| 139 | +OPENAI_API_KEY=<your API key> |
| 140 | +``` |
| 141 | +This env will be mounted into the docker container, so it can be used to set any other relevant environment variables |
| 142 | + |
| 143 | +## Create SWE-Agent Wrapper Config |
| 144 | + |
| 145 | +To easily be able to execute swe-agent runs, create a YAML config under the `sweagent_wrapper_configs/` directory. |
| 146 | +The config structure should look like: |
| 147 | + |
| 148 | +```yaml |
| 149 | +output_dir: sweagent_results/sweagent/test # REQUIRED: This writes results to sweagent_results/sweagent/test, can be changed to any path under sweagent_results/ |
| 150 | +sweagent_command: | |
| 151 | + sweagent run-batch \ |
| 152 | + --config config/tool_use.yaml \ |
| 153 | + --output_dir {output_dir} \ |
| 154 | + --num_workers 10 \ |
| 155 | + --random_delay_multiplier 1 \ |
| 156 | + --instances.type file \ |
| 157 | + --instances.path data/instances.yaml \ |
| 158 | + --instances.slice :10 \ |
| 159 | + --instances.shuffle=False \ |
| 160 | + --instances.deployment.type=modal \ |
| 161 | + --instances.deployment.startup_timeout 1800 \ |
| 162 | + --instances.deployment.runtime_timeout 3600 \ |
| 163 | + --agent.model.name anthropic/claude-3-7-sonnet-20250219 \ |
| 164 | + --agent.model.api_base $OPENAI_BASE_URL \ |
| 165 | + --agent.model.api_key $OPENAI_API_KEY # Make sure this is set in the .env file |
| 166 | +``` |
| 167 | +
|
| 168 | +The command section refers to the exact swe-agent command which will be executed. Please actively refer to https://swe-agent.com/latest/usage/batch_mode/ for command line arguments when running batch commands for swe-agent. |
| 169 | +The above examples runs sweagent on 10 sweap instances. |
| 170 | +
|
| 171 | +**For a working example**: See `sweagent_wrapper_configs/example_config.yaml` which runs SWE-agent on the first 25 instances from the generated instances.yaml file using Modal deployment with 20 workers, 30-minute startup timeout for cold starts, 1-hour runtime timeout, and a 3-call limit per instance for testing. Remove the `instances.slice` from the config to run on all instances |
| 172 | + |
| 173 | +### Configurable Agent Options |
| 174 | + |
| 175 | +You can set additional agent configuration options in your wrapper config or via command-line flags. Common options include: |
| 176 | + |
| 177 | +**Model Limits:** |
| 178 | +- `--agent.model.per_instance_cost_limit <value>` - Cost limit per instance in dollars (default: 3.0, set to 0 to disable) |
| 179 | +- `--agent.model.total_cost_limit <value>` - Total cost limit across all instances (default: 0, disabled) |
| 180 | +- `--agent.model.per_instance_call_limit <value>` - Maximum LLM calls per instance (default: 0, disabled) |
| 181 | + |
| 182 | +**Model Parameters:** |
| 183 | +- `--agent.model.temperature <value>` - Sampling temperature (default: 0.0) |
| 184 | +- `--agent.model.top_p <value>` - Sampling top-p (default: 1.0) |
| 185 | +- `--agent.model.max_input_tokens <value>` - Override max input tokens |
| 186 | +- `--agent.model.max_output_tokens <value>` - Override max output tokens |
| 187 | + |
| 188 | +For a complete list of all configuration options, refer to: |
| 189 | +- Model options: `sweagent/agent/models.py` - `GenericAPIModelConfig` class |
| 190 | +- Deployment options: Set via `--instances.deployment.*` flags |
| 191 | +- All SWE-agent options: See the [official SWE-agent documentation](https://github.com/SWE-agent/SWE-agent) |
| 192 | + |
| 193 | +## Build and Run Docker Container |
| 194 | +Now, to actually run the dockerized setup, run the following commands from the `SWE-agent/` directory: |
| 195 | + |
| 196 | +``` |
| 197 | +just build && just run |
| 198 | +``` |
| 199 | + |
| 200 | +This command will build a docker image (sweagent-image) with a tag of your username, and run the docker container. |
| 201 | + |
| 202 | +**Note**: The Docker build process automatically applies the SWE-Rex patches from `swerex_patches/` during the image build, so you don't need to manually apply them inside the container. |
| 203 | + |
| 204 | +**NOTE: You don't have to build the image each time, you can run using just run in the future if your only changes are config changes. See the "Config file or Command Changes" section** |
| 205 | + |
| 206 | +## Run sweagent to generate predictions! |
| 207 | +Finally, once inside the docker container you can run the wrapper script with your curated wrapper config which is under `sweagent_wrapper_configs/` (you don't need to pass in the full path, the CLI is configured to only check that directory): |
| 208 | + |
| 209 | +``` |
| 210 | +python sweagent_wrapper.py <your config.yaml> |
| 211 | +``` |
| 212 | + |
| 213 | +EG: |
| 214 | +``` |
| 215 | +python sweagent_wrapper.py wrapper_config.yaml |
| 216 | +``` |
| 217 | +OR (without the yaml extension) |
| 218 | +``` |
| 219 | +python sweagent_wrapper.py wrapper_config |
| 220 | +``` |
| 221 | +will execute the sweagent and swebench commands defined in `sweagent_wrapper_configs/wrapper_config.yaml` |
| 222 | +
|
| 223 | +You should be able to see the run logs actively your console, and final predictions will be written to `{--output_dir}/preds.json`. |
| 224 | +
|
| 225 | +### Config file or Command Changes |
| 226 | +
|
| 227 | +The files under `sweagent_wrapper_configs/` and `config/` are synced into the **running docker container** automatically (changes will be reflected when files are saved). So configs as well as commands to sweagent can be changed without any rebuilding required. |
| 228 | +
|
| 229 | +EG: |
| 230 | +``` |
| 231 | +just run |
| 232 | +``` |
| 233 | +
|
| 234 | +**Important**: If you modify files in `swerex_patches/`, you need to rebuild the Docker image with `just build` for the changes to take effect, as patches are applied during the build process. |
0 commit comments