Skip to content

stats-worker: optimize cold start with single-pass bulk loading#1027

Draft
adamdoupe wants to merge 2 commits intomasterfrom
stats-worker-bulk-optimization
Draft

stats-worker: optimize cold start with single-pass bulk loading#1027
adamdoupe wants to merge 2 commits intomasterfrom
stats-worker-bulk-optimization

Conversation

@adamdoupe
Copy link
Contributor

@adamdoupe adamdoupe commented Jan 12, 2026

Summary

  • Replace ~12,000 individual DB queries with 3 bulk queries during cold start
  • Reduces initialization time from 40+ minutes to under 10 minutes for large deployments (5M+ solves)
  • Same code path for cold start AND single dojo/module recalculation (no duplicate logic)

Changes

  • New: bulk_loader.py - single bulk query to load all solves with JOINs, build in-memory indexes
  • New: calculators.py - pure calculation functions operating on indexes
  • Modified: __main__.py - use calculate_all_stats() for cold start
  • Modified: handlers - use bulk mode for dojo/module recalculation events
  • Modified: background_stats.py - add bulk_set_cached_stats() with Redis pipeline

Architecture

calculate_all_stats(filter_dojo_id=None, filter_module_index=None)
  |
  ├── No filters: cold start (all dojos/modules)
  ├── filter_dojo_id=X: single dojo recalc
  └── filter_dojo_id=X + filter_module_index=Y: single module recalc

Incremental solve handlers remain unchanged for fast O(1) updates.

Test plan

  • Full test suite: 135 passed, 4 skipped
  • Stats worker cold start completes successfully
  • All background stats tests pass (47 tests)

🤖 Generated with Claude Code

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e47df2fcac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 125 to +128
try:
logger.info(f"Calculating scoreboard for {model_type} {model_id}, duration={duration}...")
scoreboard = calculate_scoreboard(model, duration)
cache_key = f"{cache_prefix}:{duration}"
set_cached_stat(cache_key, scoreboard)
logger.info(f"Successfully updated scoreboard cache {cache_key} ({len(scoreboard)} entries)")
logger.info(f"Calculating stats for module dojo_id={dojo_id}, module_index={module_index} using bulk mode...")
stats_data = calculate_all_stats(filter_dojo_id=dojo_id, filter_module_index=module_index)
bulk_set_cached_stats(stats_data)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid overwriting dojo stats with module-only data

In the module branch of handle_scoreboard_update, calculate_all_stats(filter_dojo_id=..., filter_module_index=...) is called and its full result set is written via bulk_set_cached_stats. Because calculate_all_stats builds indexes using the module filter, the dojo-level stats/scoreboards/scores are computed from only that module’s solves and then overwrite the dojo-wide cache entries. This means any module update event (e.g., DojoChallenges/DojoModules updates queued in listeners) will undercount dojo stats/leaderboards until a full dojo recalculation runs. Consider writing only module-scoped keys for module updates or rebuilding indexes without the module filter for dojo-wide entries.

Useful? React with 👍 / 👎.

Comment on lines 313 to 317
dojo.challenge_count = sum(
1 for key in indexes.challenges
if key[0] == dojo_id
)
dojo.visible_challenge_count = dojo.challenge_count

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect visibility when counting visible challenges

In build_indexes, visible_challenge_count is set equal to the total challenge count without applying visibility rules. This regresses the dojo stats field visible_challenges, which previously only counted challenges where DojoChallenges.visible() was true; hidden or time-gated challenges will now be reported as visible. Consider computing visible_challenge_count using the visibility rules at the current time instead of defaulting to the total.

Useful? React with 👍 / 👎.

@codecov
Copy link

codecov bot commented Jan 12, 2026

Replace ~12,000 individual DB queries with 3 bulk queries during cold start.
This dramatically reduces initialization time for large deployments.

Changes:
- Add bulk_loader.py: single bulk query to load all solves with JOINs
- Add calculators.py: pure calculation functions operating on in-memory indexes
- Update __main__.py to use calculate_all_stats() for cold start
- Update handlers to use bulk mode for dojo/module recalculation events
- Add bulk_set_cached_stats() with Redis pipeline for efficient writes

The same calculate_all_stats(filter_dojo_id, filter_module_index) function
handles both cold start (no filters) and targeted recalculation (with filters),
eliminating duplicate code paths.

Incremental solve update handlers remain unchanged for fast O(1) updates.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@adamdoupe adamdoupe force-pushed the stats-worker-bulk-optimization branch from e47df2f to 29dcd56 Compare January 12, 2026 21:38
P1: Skip dojo-level calculations when doing module-only recalculation
    - Module updates now only write module-specific cache keys
    - Prevents overwriting dojo stats with partial data

P2: Calculate visible_challenge_count using visibility rules
    - Check start/stop times against current time
    - Previously defaulted to total count ignoring visibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@adamdoupe adamdoupe marked this pull request as draft January 30, 2026 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments