Skip to content

Add Case 6: Circuit Breaker pattern for CNPG failover#6

Merged
shahabhameed merged 1 commit intoinfobloxopen:mainfrom
shahabhameed:cnpg-circuit-breaker
Apr 9, 2026
Merged

Add Case 6: Circuit Breaker pattern for CNPG failover#6
shahabhameed merged 1 commit intoinfobloxopen:mainfrom
shahabhameed:cnpg-circuit-breaker

Conversation

@shahabhameed
Copy link
Copy Markdown
Collaborator

Summary

Adds a new workshop case teaching the circuit breaker resilience pattern. When the CNPG primary is killed, the application without a breaker blocks every request on 30-second TCP timeouts, collapsing throughput. The fix enables a breaker that detects the outage after 5 failures and rejects subsequent requests in <1ms, preserving throughput and latency.

Key design decisions

  • Reuses CNPG infrastructure from Case 5 — same cluster, same fault injection function. Lab instructions include reverting Case 5 fixes (back to 1 instance) so the outage window is visible.

  • Higher errors = better service — The breaker intentionally raises error rate (60% vs 15%) but drops p95 by 200x and doubles throughput. The [MaxErrRate: 0.75] threshold accommodates this.

  • Breaker disabled by default — [Threshold: 0] ships as the broken state. Students search for LAB: STEP6 TODO and set [Threshold: 5, Timeout: 5s].

Broken state (Threshold=0)

Score: 68/100
Error rate: ~15%
p95: ~3,145ms
Each kill = full outage until pod restarts + Blocks on TCP timeout

Screenshot 2026-04-07 at 10 54 20 PM

Fixed state (Threshold=5, Timeout=5s)

Score: 100/100
Error rate: ~60%
p95: ~10ms
Fails fast, throughput maintained

Screenshot 2026-04-07 at 10 54 44 PM

Implements a circuit breaker that detects database failures and fails
fast instead of blocking on 30s connection timeouts. Three-state machine
(Closed→Open→HalfOpen) with configurable threshold and probe timeout.

Broken baseline: 88/100 (p95=2145ms, 14.7% errors, 320 reqs)
Fixed baseline:  100/100 (p95=10ms, 60% errors, 599 reqs)

New files:
- pkg/cases/breaker.go — Circuit breaker state machine
- pkg/cases/circuitbreaker_case.go — Handler with LAB: STEP6 TODOs
- docs/circuitbreaker_case.md — Deep-dive documentation

Modified:
- pkg/api/handler.go — Register /cases/circuitbreaker route
- pkg/driver/scenarios.go — Add circuitbreaker scenario (60s, fault at
t=15s)
- All docs updated (FACILITATOR, LAB, AGENT_PROMPTS, LEADERBOARD,
README)
@shahabhameed shahabhameed changed the title Add Case 6 — Circuit Breaker pattern for CNPG failover Add Case 6: Circuit Breaker pattern for CNPG failover Apr 8, 2026
@shahabhameed shahabhameed merged commit 99cbf15 into infobloxopen:main Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant