Skip to content

Queue-on-conflict retry for skipped scheduled scans + log handler session recovery (v2.6.39)#48

Merged
ttlequals0 merged 1 commit intomainfrom
fix/scheduler-retry-log-rollback
Apr 20, 2026
Merged

Queue-on-conflict retry for skipped scheduled scans + log handler session recovery (v2.6.39)#48
ttlequals0 merged 1 commit intomainfrom
fix/scheduler-retry-log-rollback

Conversation

@ttlequals0
Copy link
Copy Markdown
Owner

Summary

  • Scheduled scans no longer silently dropped on overlap: MediaScheduler now queues a one-shot date-trigger retry when a cron fire is skipped because another scan is running. Configurable via SCHEDULE_RETRY_DELAY_MINUTES (default 10) and SCHEDULE_RETRY_MAX_COUNT (default 6). Retry state clears as soon as a scan actually starts.
  • DatabaseLogHandler recovers from a failed flush: adds db.session.remove() after rollback so the scoped session isn't stuck in "rollback() fully before proceeding" state for the life of the writer thread; rate-limits the stderr report to once per 60s.
  • Retry state is in-process; if the worker restarts while a retry is pending, the retry is lost and the next cron fire will pick things up.

Version

2.6.39 (bumped from 2.6.38).

Test plan

  • Full pytest suite green (319 passed, 8 skipped, 0 failed)
  • Added 3 new tests for _queue_conflict_retry (queue, cap, clear, add_job-failure rollback, malformed env var) and 1 expanded test for DatabaseLogHandler flush recovery (rollback+remove path, throttle window expiry)
  • Docker image 2.6.39 built for linux/amd64 and pushed to Docker Hub (also tagged latest)
  • Deploy via Portainer webhook and confirm /api/version reports 2.6.39
  • After next overlap event, confirm queued retry warning appears and the retry fires
  • Confirm Failed to flush stderr lines stop flooding

…sion recovery (v2.6.39)

Scheduled scans whose cron fire overlapped another in-progress scan were
silently dropped: APScheduler consumed the fire and advanced next_run
to the following cron interval, so a weekly Sunday 02:00 cleanup could
go missing for a full week. MediaScheduler now queues a one-shot
date-trigger retry (SCHEDULE_RETRY_DELAY_MINUTES, default 10),
SCHEDULE_RETRY_MAX_COUNT times (default 6), and clears the retry state
as soon as a scan actually starts. Applies to _run_scheduled_scan,
_run_periodic_scan, and the env-var _run_cleanup HTTP 409 path.

DatabaseLogHandler could get stuck after a flush failure: rollback()
was attempted in a silent try/except, so when the rollback itself
failed (stale connection, broken pipe) the scoped session stayed in
"rollback() fully before proceeding" state for the life of the writer
thread, producing a flood of "Failed to flush N log entries" lines.
Added db.session.remove() after rollback and rate-limited the stderr
report to once per 60s so recurring failures still surface.
@ttlequals0 ttlequals0 merged commit 035b516 into main Apr 20, 2026
6 checks passed
@ttlequals0 ttlequals0 deleted the fix/scheduler-retry-log-rollback branch April 20, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant