stats-worker: optimize cold start with single-pass bulk loading#1027
stats-worker: optimize cold start with single-pass bulk loading#1027
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e47df2fcac
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| try: | ||
| logger.info(f"Calculating scoreboard for {model_type} {model_id}, duration={duration}...") | ||
| scoreboard = calculate_scoreboard(model, duration) | ||
| cache_key = f"{cache_prefix}:{duration}" | ||
| set_cached_stat(cache_key, scoreboard) | ||
| logger.info(f"Successfully updated scoreboard cache {cache_key} ({len(scoreboard)} entries)") | ||
| logger.info(f"Calculating stats for module dojo_id={dojo_id}, module_index={module_index} using bulk mode...") | ||
| stats_data = calculate_all_stats(filter_dojo_id=dojo_id, filter_module_index=module_index) | ||
| bulk_set_cached_stats(stats_data) |
There was a problem hiding this comment.
Avoid overwriting dojo stats with module-only data
In the module branch of handle_scoreboard_update, calculate_all_stats(filter_dojo_id=..., filter_module_index=...) is called and its full result set is written via bulk_set_cached_stats. Because calculate_all_stats builds indexes using the module filter, the dojo-level stats/scoreboards/scores are computed from only that module’s solves and then overwrite the dojo-wide cache entries. This means any module update event (e.g., DojoChallenges/DojoModules updates queued in listeners) will undercount dojo stats/leaderboards until a full dojo recalculation runs. Consider writing only module-scoped keys for module updates or rebuilding indexes without the module filter for dojo-wide entries.
Useful? React with 👍 / 👎.
dojo_plugin/worker/bulk_loader.py
Outdated
| dojo.challenge_count = sum( | ||
| 1 for key in indexes.challenges | ||
| if key[0] == dojo_id | ||
| ) | ||
| dojo.visible_challenge_count = dojo.challenge_count |
There was a problem hiding this comment.
Respect visibility when counting visible challenges
In build_indexes, visible_challenge_count is set equal to the total challenge count without applying visibility rules. This regresses the dojo stats field visible_challenges, which previously only counted challenges where DojoChallenges.visible() was true; hidden or time-gated challenges will now be reported as visible. Consider computing visible_challenge_count using the visibility rules at the current time instead of defaulting to the total.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Replace ~12,000 individual DB queries with 3 bulk queries during cold start. This dramatically reduces initialization time for large deployments. Changes: - Add bulk_loader.py: single bulk query to load all solves with JOINs - Add calculators.py: pure calculation functions operating on in-memory indexes - Update __main__.py to use calculate_all_stats() for cold start - Update handlers to use bulk mode for dojo/module recalculation events - Add bulk_set_cached_stats() with Redis pipeline for efficient writes The same calculate_all_stats(filter_dojo_id, filter_module_index) function handles both cold start (no filters) and targeted recalculation (with filters), eliminating duplicate code paths. Incremental solve update handlers remain unchanged for fast O(1) updates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
e47df2f to
29dcd56
Compare
P1: Skip dojo-level calculations when doing module-only recalculation
- Module updates now only write module-specific cache keys
- Prevents overwriting dojo stats with partial data
P2: Calculate visible_challenge_count using visibility rules
- Check start/stop times against current time
- Previously defaulted to total count ignoring visibility
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Changes
bulk_loader.py- single bulk query to load all solves with JOINs, build in-memory indexescalculators.py- pure calculation functions operating on indexes__main__.py- usecalculate_all_stats()for cold startbackground_stats.py- addbulk_set_cached_stats()with Redis pipelineArchitecture
Incremental solve handlers remain unchanged for fast O(1) updates.
Test plan
🤖 Generated with Claude Code