feat: chaos testing infrastructure for workflow resilience#1333
feat: chaos testing infrastructure for workflow resilience#1333TooTallNate wants to merge 5 commits intomainfrom
Conversation
…tion Add a world-agnostic RequestContext ALS that propagates chaos testing configuration through the entire workflow execution chain — from start() through queue messages to workflow/step handlers. Each World implementation interprets the context independently (world-vercel routes to a dedicated chaos server and adds X-Chaos headers). Changes: - New RequestContext ALS in @workflow/world for cross-cutting per-request concerns - Chaos mode propagation through WorkflowInvokePayload and StepInvokePayload - world-vercel routes to chaos.workflow-server.com and sets X-Chaos headers - CI job running E2E suite under chaos modes (random-500, random-429)
|
🧪 E2E Test Results❌ Some tests failed Summary
❌ Failed Tests▲ Vercel Production (1 failed)sveltekit (1 failed):
🐘 Local Postgres (2 failed)fastify-stable (1 failed):
nextjs-turbopack-canary (1 failed):
🌍 Community Worlds (56 failed)mongodb (3 failed):
redis (2 failed):
turso (51 failed):
Details by Category❌ ▲ Vercel Production
✅ 💻 Local Development
✅ 📦 Local Production
❌ 🐘 Local Postgres
✅ 🪟 Windows
❌ 🌍 Community Worlds
✅ 📋 Other
❌ Some E2E test jobs failed:
Check the workflow run for details. |
📊 Benchmark Results
workflow with no steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) workflow with 1 step💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro workflow with 10 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express workflow with 25 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Next.js (Turbopack) | Express | Nitro workflow with 50 sequential steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Next.js (Turbopack) | Express Promise.all with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.all with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.all with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Next.js (Turbopack) | Nitro Promise.race with 10 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Nitro | Express | Next.js (Turbopack) Promise.race with 25 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Promise.race with 50 concurrent steps💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) Stream Benchmarks (includes TTFB metrics)workflow with stream💻 Local Development
▲ Production (Vercel)
🔍 Observability: Express | Nitro | Next.js (Turbopack) SummaryFastest Framework by WorldWinner determined by most benchmark wins
Fastest World by FrameworkWinner determined by most benchmark wins
Column Definitions
Worlds:
|
There was a problem hiding this comment.
Pull request overview
Adds SDK-side chaos testing plumbing so workflow/step execution can be exercised under injected server failures, with chaos configuration propagated through runtime context and queue payloads and interpreted by world-vercel.
Changes:
- Introduces a World-agnostic
AsyncLocalStorage<RequestContext>for propagating chaos config. - Extends workflow/step queue payload schemas to carry a
chaosmode and wires it through core runtime enqueue/re-enqueue paths. - Updates
world-vercelHTTP client utils to route to a chaos server and attach chaos headers; adds a new GitHub Actions chaos E2E job.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/world/src/request-context.ts | Adds RequestContext + ALS helpers for per-request cross-cutting settings. |
| packages/world/src/queue.ts | Adds optional chaos to workflow/step invoke payload schemas. |
| packages/world/src/index.ts | Exports request context types/helpers from @workflow/world. |
| packages/world-vercel/src/utils.ts | Uses request context to route to chaos server and add X-Chaos* headers. |
| packages/core/src/runtime/start.ts | Reads chaos env vars, enters ALS, propagates chaos into run creation/queueing. |
| packages/core/src/runtime.ts | Workflow handler enters ALS from payload and propagates chaos on suspension/re-enqueue. |
| packages/core/src/runtime/suspension-handler.ts | Propagates chaos into step queue messages created during suspension handling. |
| packages/core/src/runtime/step-handler.ts | Step handler enters ALS from payload and propagates chaos on workflow re-enqueue paths. |
| packages/core/src/runtime/resume-hook.ts | Propagates chaos from run executionContext on hook resume enqueue. |
| .github/workflows/tests.yml | Adds chaos-e2e-vercel CI job running E2E under chaos modes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
…ntry, rename CI matrix key - Add chaosSeed field to WorkflowInvokePayloadSchema and StepInvokePayloadSchema - Propagate chaosSeed through entire execution chain: start() -> executionContext, queue payloads, workflow handler, step handler, suspension handler, resume hook - Only enter requestContext.run() when chaos config is present, so getRequestContext() returns undefined in non-chaos cases - Fix getRequestContext() docs to match actual behavior - Rename CI matrix key from chaos-mode to chaos_mode to avoid GitHub Actions expression syntax treating hyphen as subtraction
Summary
Adds chaos testing infrastructure to validate that the Workflow DevKit's retry stack and durability guarantees hold up under server-side failures. This is the SDK side — the server-side counterpart is in vercel/workflow-server.
How it works
WORKFLOW_CHAOS=random-500(orrandom-429) env varstart()reads the env var and enters aRequestContextALS with chaos configWorkflowInvokePayload.chaos,StepInvokePayload.chaos) so deployed workbench apps also participateworld-vercelinterprets the context: routes HTTP requests tohttps://chaos.workflow-server.comand addsX-Chaos/X-Chaos-SeedheaderswithServerErrorRetry,withThrottleRetry) handles the failures and the workflow completes correctlyChanges
New files
packages/world/src/request-context.ts— World-agnosticAsyncLocalStorage<RequestContext>withchaosandchaosSeedfieldsModified files
packages/world/src/queue.ts— Added optionalchaosfield to both payload schemaspackages/world/src/index.ts— Exports request contextpackages/world-vercel/src/utils.ts— Routes to chaos server and adds headers when chaos is activepackages/core/src/runtime/start.ts— Reads env var, enters ALS, propagates toexecutionContext+ queue payloadpackages/core/src/runtime.ts— Workflow handler: parse chaos, enter ALS, propagate on re-enqueuepackages/core/src/runtime/suspension-handler.ts— Propagates chaos to step queue payloadspackages/core/src/runtime/step-handler.ts— Step handler: parse chaos, enter ALS, propagate at all 4 re-enqueue sitespackages/core/src/runtime/resume-hook.ts— Reads chaos fromexecutionContextfor hook resume.github/workflows/tests.yml— Newchaos-e2e-vercelCI job (matrix: 2 chaos modes × 2 apps)Related PRs