Skip to content

fix: split health endpoint, isolate sequencer, embed lexicons#168

Open
rabble wants to merge 1 commit intoblacksky-algorithms:mainfrom
divinevideo:pr/health-sequencer-lexicons
Open

fix: split health endpoint, isolate sequencer, embed lexicons#168
rabble wants to merge 1 commit intoblacksky-algorithms:mainfrom
divinevideo:pr/health-sequencer-lexicons

Conversation

@rabble
Copy link
Copy Markdown

@rabble rabble commented Mar 29, 2026

Summary

  • Health endpoint split: The existing /xrpc/_health endpoint now serves as a fast liveness probe (no database check). A new /xrpc/_health/ready endpoint performs the deep readiness check with DB connectivity verification. This allows Kubernetes liveness probes to use the fast endpoint without hitting the database on every check interval.
  • Sequencer thread isolation: Move the sequencer polling loop from tokio::spawn (shared worker pool) to a dedicated OS thread with its own single-threaded tokio runtime. This prevents sequencer DB polling from competing with HTTP request handling, avoiding worker thread exhaustion under load.
  • ID resolver configuration: Use ServerConfig values (resolver_timeout, cache_state_ttl, cache_max_ttl, plc_url) for the identity resolver instead of reading raw env vars with default None values. This ensures the resolver respects configured timeouts and cache TTLs.
  • Compile-time lexicons: Use include_str!() to embed lexicons.toml at compile time instead of fs::read_to_string() at runtime. This removes the fragile requirement to have lexicons.toml in the working directory, which was problematic in containerized deployments.

Test plan

  • Verify GET /xrpc/_health returns version without database dependency
  • Verify GET /xrpc/_health/ready returns version when DB is connected, 503 when not
  • Verify sequencer continues to poll and emit firehose events after the thread isolation change
  • Verify subscribe_repos WebSocket consumers still receive events
  • Run cargo test -p rsky-pds for the new lexicon loading test
  • Deploy in container without lexicons.toml in working directory to verify compile-time embedding works

🤖 Generated with Claude Code

…at compile time

Health endpoint:
- Split /xrpc/_health into a fast liveness probe (no DB check) and
  /xrpc/_health/ready for deep readiness (DB connectivity check).
  Kubernetes and other orchestrators can use the fast endpoint for
  liveness probes without hitting the database on every check.

Sequencer isolation:
- Run the sequencer polling loop on a dedicated OS thread with its own
  single-threaded tokio runtime. This prevents sequencer DB polling from
  competing with request-handling tasks on the main tokio runtime, which
  could cause worker thread exhaustion under load.

ID resolver:
- Use ServerConfig values (resolver_timeout, cache_state_ttl,
  cache_max_ttl, plc_url) instead of reading env vars inline. This
  ensures the resolver respects the configured timeouts and cache TTLs
  rather than using defaults.

Lexicons:
- Use include_str!() to embed lexicons.toml at compile time instead of
  reading from the filesystem at runtime. This removes the requirement
  to have lexicons.toml present in the working directory, which was
  fragile in containerized deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant