Add provider-aware parallel execution to run_benchmarks.py

Plan: Provider-Aware Parallel Execution

 1. Analyze current structure

 - Review how benchmarks currently execute
 - Understand the test_config structure and provider information
 - Check if there are any threading/multiprocessing concerns with current code

 2. Create benchmark grouping function

 - Add function group_tests_by_provider(test_configs) that:
   - Takes a list of test configurations
   - Groups them by provider (openai, genai, anthropic, mistral, etc.)
   - Returns dict: {provider: [test_configs]}

 3. Implement per-provider parallel executor

 - Add function run_provider_tests_parallel(provider, test_configs, max_workers) that:
   - Uses ThreadPoolExecutor or ProcessPoolExecutor
   - Runs multiple tests for one provider in parallel
   - Limits concurrency to max_workers (default: 2)
   - Handles exceptions and logs results

 4. Create main parallel orchestrator

 - Add function main_parallel(test_ids, max_workers_per_provider=2) that:
   - Reads the CSV and filters by test_ids
   - Groups tests by provider
   - Spawns separate executor for each provider
   - All providers run simultaneously, each with limited concurrency
   - Waits for all to complete and reports summary

 5. Keep backward compatibility

 - Keep existing main() function unchanged for sequential execution
 - Add the new main_parallel() as an alternative entry point
 - Update the if __name__ == "__main__" block to use main_parallel()

 6. Add configuration and safety

 - Add MAX_WORKERS_PER_PROVIDER config variable at top of file
 - Add proper error handling for parallel execution
 - Ensure logging is thread-safe
 - Add summary report at the end (success/failure counts per provider)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add provider-aware parallel execution to run_benchmarks.py #71

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add provider-aware parallel execution to run_benchmarks.py #71

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions