You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each instance in the HuggingFace dataset has a `dockerhub_tag` column containing the Docker tag for that instance. You can access it directly:
66
75
67
-
Note that bash runs by default in our images. e.g. when running these images, you should not manually envoke bash. See https://github.com/scaleapi/SWE-bench_Pro-os/issues/6
**Important:** Bash runs by default in our images. When running these images, you should not manually invoke bash. See https://github.com/scaleapi/SWE-bench_Pro-os/issues/6
68
90
69
91
## Usage
70
92
@@ -113,7 +135,8 @@ This will create a JSON file in the format expected by the evaluation script:
113
135
```
114
136
115
137
### 3. Evaluate Patches
116
-
Evaluate patch predictions on SWE-Bench Pro with the following command. (`swe_bench_pro_full.csv` is the CSV in the HuggingFace dataset)
Replace gold_patches with your patch json, and point raw_sample_path to the SWE-Bench Pro CSV.
129
-
Gold Patches can be compiled from the HuggingFace dataset.
151
+
You can test with the gold patches, which are in the HuggingFace dataset. There is a helper script in `helper_code` which can extract the gold patches into the required JSON format.
130
152
131
153
## Reproducing Leaderboard Results
132
154
@@ -138,4 +160,3 @@ To reproduce leaderboard results end-to-end, follow the following steps:
138
160
4. Run the evaluation script `swe_bench_pro_eval.py` to run the evaluation script.
0 commit comments