Skip to content

Trying to fix the automatic scoring & leaderboard updating in the github action.#4

Merged
Orrell merged 2 commits intoq-variance:mainfrom
sitmo:test-workflow-2
Dec 12, 2025
Merged

Trying to fix the automatic scoring & leaderboard updating in the github action.#4
Orrell merged 2 commits intoq-variance:mainfrom
sitmo:test-workflow-2

Conversation

@sitmo
Copy link

@sitmo sitmo commented Dec 11, 2025

Fix GitHub Actions Scoring Workflow

Problem

The GitHub Actions workflow for scoring submissions was failing because:

  1. Wrong script path: Workflow referenced scoring/score_submission.py but the file is at code/score_submission.py
  2. Wrong script purpose: score_submission.py was designed to score benchmark data (reads dataset_part1.parquet, etc.), not submission data
  3. Missing files: update_leaderboard.py was referenced but didn't exist
  4. No error handling: Workflow would fail if submissions were missing dataset.parquet files

Solution

1. Preserved Original Script

  • code/score_submission.py
  • This script scores the benchmark/reference data and is kept for its original purpose
  • Disabled in workflow with continue-on-error: true to prevent failures

2. Created New Submission Scoring Script

  • code/score_new_submission.py - New script specifically for scoring new submissions
  • Reads from submissions/{folder}/dataset.parquet files
  • Fail-safe design: Gracefully skips submissions without dataset.parquet (with warning)
  • Uses GitHub API to detect which submission folders were modified in the PR
  • Outputs JSON results for leaderboard integration
  • Handles errors gracefully without crashing the workflow

3. Created Leaderboard Update Script

  • code/update_leaderboard.py - Updates leaderboard/leaderboard.json with new scores
  • Reads scoring results from scoring_results.json
  • Adds/updates submission entries and sorts by R² score
  • Creates leaderboard file if it doesn't exist

4. Updated Workflow

  • Fixed script paths (code/score_new_submission.py, code/update_leaderboard.py)
  • Added requests dependency for GitHub API access
  • Added error handling (continue-on-error: true) to prevent workflow failures
  • All code organized in /code folder

5. Added .gitignore

  • Ignores Python artifacts (__pycache__/, *.pyc, etc.)
  • Ignores virtual environments, IDE files, and temporary files

Files Changed

  • .github/workflows/score.yml - Updated paths and added error handling
  • code/score_new_submission.py - NEW - Submission scoring script
  • code/update_leaderboard.py - NEW - Leaderboard update script
  • leaderboard/leaderboard.json - NEW - Initial leaderboard structure
  • .gitignore - NEW - Python/IDE ignores

Testing

This has been tested in a fork and successfully:

  • Scores submissions that have dataset.parquet
  • Skips submissions without dataset.parquet (e.g., grok_rough_vol) without failing
  • Updates the leaderboard with new scores
  • Commits the updated leaderboard back to the PR

@Orrell Orrell merged commit 1864821 into q-variance:main Dec 12, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants