Skip to content

Queueing: canonicalize repo keys before storage and worker updates#41

Open
vultuk wants to merge 1 commit into
mainfrom
fix/40-canonicalize-repo-keys
Open

Queueing: canonicalize repo keys before storage and worker updates#41
vultuk wants to merge 1 commit into
mainfrom
fix/40-canonicalize-repo-keys

Conversation

@vultuk

@vultuk vultuk commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator

Closes #40

Summary

  • canonicalize backend repo identifiers before queue, live-status, and repo-report writes
  • keep web lookups and worker/report updates case-tolerant for legacy mixed-case data during rollout
  • add a migration plus regression tests for mixed-case queue/report behavior

Why this improves Discofork

GitHub repository paths are case-insensitive, but Discofork was treating mixed-case variants as different backend identities. That could split queue jobs, cached rows, and worker ready updates for the same repository. This change keeps one canonical backend identity per repo and adds rollout compatibility for older mixed-case Redis/DB state.

What changed

  • added repo-key helpers on the web and worker backends to canonicalize owner/repo keys to lowercase
  • normalized new Redis dedupe keys, queue entries, live-status keys, and repo_reports writes
  • made queue cleanup/requeue and status/report lookups tolerant of legacy mixed-case entries during rollout
  • added migration migrations/0004_repo_report_canonical_full_names.sql to deduplicate case-only repo_reports rows, lowercase stored identifiers, and enforce uniqueness on lower(full_name)
  • added regression tests for repository page loading, report persistence normalization, and legacy queue dedupe/cleanup/requeue behavior

Validation

  • npx bun run typecheck
  • npx bun test
  • review-changes reviewer-only pass: LGTM after fixing rollout gaps around legacy queue dedupe and legacy progress-key reads

Risks / tradeoffs / follow-ups

  • rollout still relies on compatibility code for legacy Redis queue/progress entries until they drain naturally
  • the new migration keeps one best row per case-insensitive repo_reports identity, preferring ready data first and then newer rows
  • a future cleanup PR could remove some legacy fallback logic once old mixed-case queue/progress state is no longer present

Exec plan

  • .hermes/plans/2026-04-09-canonicalize-repo-keys.md

Prevent mixed-case owner/repo requests from splitting queue, live-status, and repo_reports identities.

Add targeted regressions and a migration that deduplicates legacy case-only repo_report rows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Queueing: canonicalize repo keys before storage and worker updates

1 participant