stats-worker: optimize cold start with single-pass bulk loading by adamdoupe · Pull Request #1027 · pwncollege/dojo

adamdoupe · 2026-01-12T00:03:26Z

Summary

Replace ~12,000 individual DB queries with 3 bulk queries during cold start
Reduces initialization time from 40+ minutes to under 10 minutes for large deployments (5M+ solves)
Same code path for cold start AND single dojo/module recalculation (no duplicate logic)

Changes

New: bulk_loader.py - single bulk query to load all solves with JOINs, build in-memory indexes
New: calculators.py - pure calculation functions operating on indexes
Modified: __main__.py - use calculate_all_stats() for cold start
Modified: handlers - use bulk mode for dojo/module recalculation events
Modified: background_stats.py - add bulk_set_cached_stats() with Redis pipeline

Architecture

calculate_all_stats(filter_dojo_id=None, filter_module_index=None)
  |
  ├── No filters: cold start (all dojos/modules)
  ├── filter_dojo_id=X: single dojo recalc
  └── filter_dojo_id=X + filter_module_index=Y: single module recalc

Incremental solve handlers remain unchanged for fast O(1) updates.

Test plan

Full test suite: 135 passed, 4 skipped
Stats worker cold start completes successfully
All background stats tests pass (47 tests)

🤖 Generated with Claude Code

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e47df2fcac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-01-12T00:08:33Z

dojo_plugin/worker/handlers/scoreboard.py

        try:
-            logger.info(f"Calculating scoreboard for {model_type} {model_id}, duration={duration}...")
-            scoreboard = calculate_scoreboard(model, duration)
-            cache_key = f"{cache_prefix}:{duration}"
-            set_cached_stat(cache_key, scoreboard)
-            logger.info(f"Successfully updated scoreboard cache {cache_key} ({len(scoreboard)} entries)")
+            logger.info(f"Calculating stats for module dojo_id={dojo_id}, module_index={module_index} using bulk mode...")
+            stats_data = calculate_all_stats(filter_dojo_id=dojo_id, filter_module_index=module_index)
+            bulk_set_cached_stats(stats_data)


Avoid overwriting dojo stats with module-only data

In the module branch of handle_scoreboard_update, calculate_all_stats(filter_dojo_id=..., filter_module_index=...) is called and its full result set is written via bulk_set_cached_stats. Because calculate_all_stats builds indexes using the module filter, the dojo-level stats/scoreboards/scores are computed from only that module’s solves and then overwrite the dojo-wide cache entries. This means any module update event (e.g., DojoChallenges/DojoModules updates queued in listeners) will undercount dojo stats/leaderboards until a full dojo recalculation runs. Consider writing only module-scoped keys for module updates or rebuilding indexes without the module filter for dojo-wide entries.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-01-12T00:08:33Z

dojo_plugin/worker/bulk_loader.py

+        dojo.challenge_count = sum(
+            1 for key in indexes.challenges
+            if key[0] == dojo_id
+        )
+        dojo.visible_challenge_count = dojo.challenge_count


Respect visibility when counting visible challenges

In build_indexes, visible_challenge_count is set equal to the total challenge count without applying visibility rules. This regresses the dojo stats field visible_challenges, which previously only counted challenges where DojoChallenges.visible() was true; hidden or time-gated challenges will now be reported as visible. Consider computing visible_challenge_count using the visibility rules at the current time instead of defaulting to the total.

Useful? React with 👍 / 👎.

codecov · 2026-01-12T00:39:47Z

Codecov Report

❌ Patch coverage is 0.48662% with 409 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
dojo_plugin/worker/bulk_loader.py	0.00%	198 Missing ⚠️
dojo_plugin/worker/calculators.py	0.00%	148 Missing ⚠️
dojo_plugin/worker/handlers/scoreboard.py	0.00%	20 Missing ⚠️
dojo_plugin/utils/background_stats.py	9.52%	19 Missing ⚠️
dojo_plugin/worker/handlers/scores.py	0.00%	11 Missing ⚠️
dojo_plugin/worker/__main__.py	0.00%	7 Missing ⚠️
dojo_plugin/worker/handlers/dojo_stats.py	0.00%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Replace ~12,000 individual DB queries with 3 bulk queries during cold start. This dramatically reduces initialization time for large deployments. Changes: - Add bulk_loader.py: single bulk query to load all solves with JOINs - Add calculators.py: pure calculation functions operating on in-memory indexes - Update __main__.py to use calculate_all_stats() for cold start - Update handlers to use bulk mode for dojo/module recalculation events - Add bulk_set_cached_stats() with Redis pipeline for efficient writes The same calculate_all_stats(filter_dojo_id, filter_module_index) function handles both cold start (no filters) and targeted recalculation (with filters), eliminating duplicate code paths. Incremental solve update handlers remain unchanged for fast O(1) updates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

P1: Skip dojo-level calculations when doing module-only recalculation - Module updates now only write module-specific cache keys - Prevents overwriting dojo stats with partial data P2: Calculate visible_challenge_count using visibility rules - Check start/stop times against current time - Previously defaulted to total count ignoring visibility Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Jan 12, 2026

View reviewed changes

adamdoupe force-pushed the stats-worker-bulk-optimization branch from e47df2f to 29dcd56 Compare January 12, 2026 21:38

adamdoupe marked this pull request as draft January 30, 2026 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats-worker: optimize cold start with single-pass bulk loading#1027

stats-worker: optimize cold start with single-pass bulk loading#1027
adamdoupe wants to merge 2 commits intomasterfrom
stats-worker-bulk-optimization

adamdoupe commented Jan 12, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Jan 12, 2026

Uh oh!

chatgpt-codex-connector bot Jan 12, 2026

Uh oh!

codecov bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

adamdoupe commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Architecture

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

adamdoupe commented Jan 12, 2026 •

edited

Loading

codecov bot commented Jan 12, 2026 •

edited

Loading