test(editor): Add comprehensive instance AI e2e tests by mutdmour · Pull Request #28326 · n8n-io/n8n

mutdmour · 2026-04-10T12:27:34Z

Summary

Add 16 e2e tests covering instance AI chat, sidebar, artifacts, confirmations, timeline, and workflow preview
Add trace replay infrastructure for deterministic LLM response replay in CI (no API key needed)
Add proxy retry with exponential backoff to handle MockServer ECONNRESET under parallel load
Fix cross-test thread contamination by using identity-based (title) lookups instead of positional selectors
Fix execution event relay ordering bug (executionFinished before pending events)
Polish instance AI artifact preview tabs

Test plan

All 16 instance AI e2e tests pass locally (verified with multiple consecutive runs)
Tests are parallel-safe — no cross-test contamination
Proxy ECONNRESET handled with retry + exponential backoff
Rebased on latest master and verified green

🤖 Generated with Claude Code

Remove type icons from tab labels, make tabs fill full header height, and replace loading spinners with larger loader-circle icon (80px). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…g (no-changelog) Adds a Playwright e2e test that captures the bug where the last node in the instance AI workflow preview stays in "running" state (spinning border) after execution completes. The test sends a specific prompt to build and execute a 3-node workflow, then asserts that no canvas nodes remain with the .running CSS class. Includes InstanceAiPage page object, navigation helper, and test fixtures with N8N_ENABLED_MODULES=instance-ai. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ionFinished (no-changelog) The event relay watcher only forwarded the last event in the log, so when Vue coalesced multiple ref updates into one callback, intermediate events (e.g. nodeExecuteAfter for the last node) were silently dropped. This left the iframe's executing-node queue with a stale entry, keeping the last node in spinning/running state after the workflow finished. - Track relayed event count so every new event is forwarded, even when the watcher fires once for multiple log additions. - Keep the eventLog intact when executionFinished arrives (instead of clearing it immediately) so the relay can forward pending events before sending the synthetic executionFinished. - Add clearEventLog() to useExecutionPushEvents, called by the relay after all pending events have been forwarded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add 16 Playwright e2e tests across 6 spec files covering instance AI workflow preview, artifacts, timeline, sidebar, confirmations, and chat basics. Wire up proxy-aware fetch in the AI SDK model creation so MockServer can intercept Anthropic API calls for recording/replay. - Expand InstanceAiPage page object with 30+ locators - Add InstanceAiSidebar component page object - Add data-test-id to preview close button - Add getProxyFetch() to model-factory.ts and instance-ai.service.ts so @ai-sdk/anthropic respects HTTP_PROXY in e2e containers - Rewrite fixtures with proxy service recording support - Replace single execution-state test with comprehensive suite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…hangelog) Add two-tier trace replay system that records tool I/O during e2e test recording and replays with bidirectional ID remapping in CI. This enables deterministic replay of complex multi-step agent tests where tool execution produces dynamic IDs. - New trace-replay.ts: IdRemapper (ID-field-aware), TraceIndex (per-role cursors), TraceWriter, JSONL I/O helpers, PURE_REPLAY_TOOLS set - Modified langsmith-tracing.ts: replayWrapTool (Tier 1: real execution + ID remap), pureReplayWrapTool (Tier 2: pure replay for external deps), recordWrapTool, createTraceReplayOnlyContext stub for non-LangSmith envs - New test-only controller endpoints: POST/GET/DELETE /test/tool-trace with slug-scoped storage for parallel test isolation - Updated fixture: records trace.jsonl during recording, loads for replay, slug-scoped activate/retrieve/clear lifecycle - 23 unit tests for IdRemapper and TraceIndex - Recorded trace.jsonl files for all 15 instance AI test expectations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…log) Add subString body matching on the system prompt to disambiguate LLM call types (title generation vs orchestrator vs sub-agent) during proxy replay. Without this, sequential expectations could be served to the wrong call when the call order differs between recording and replay. Re-record all expectations with the body matcher and remove debug logging from trace replay wrappers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When re-recording with a real API key, always use record mode (never load old trace events into the backend). Previously, existing trace files would cause the backend to enter replay mode during re-recording, resulting in trace.jsonl files with only a header and no tool calls. Re-record all trace.jsonl files with proper tool call events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Re-record all proxy expectations after fixing the recording mode logic. Expectations now have subString body matchers on the system prompt and trace.jsonl files have proper tool call events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The proxy's sequential mode sets the last expectation as unlimited (fallback for extra agent turns). Previously this applied to the last file alphabetically which could be a community_nodes GET. Now it finds the last /v1/messages POST expectation specifically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…elog) Background task completion triggers `startInternalFollowUpRun`, which creates a new trace context. Previously each context got a fresh TraceIndex with cursor at 0, so the follow-up run's first tool call (e.g. list-workflows) would mismatch the first trace event (build-workflow-with-agent) and throw. Fix: store a shared TraceIndex/IdRemapper per test slug on the service. All runs within the same slug reuse the same instances, preserving cursor state across the initial run and any follow-up runs. This fixes the two confirmation e2e tests that rely on suspend/resume. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…(no-changelog) waitForAssistantResponse only waited for the first message element to appear (streaming start), not for the agent to finish. Sidebar operations then raced against the still-running agent. New waitForResponseComplete waits for the send button to reappear, which only renders when isStreaming becomes false. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…s between tests (no-changelog) Two preview tests failed because their recorded proxy expectations contained stale LLM responses from previous tests' background task follow-ups. The fixture now cancels leftover background tasks before each test via a new test-only endpoint, preventing future cross-test contamination. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ET (no-changelog) MockServer proxy connections intermittently reset when 4 parallel workers load expectations simultaneously. Add withRetry helper with exponential backoff (3 retries, 500ms base) and re-throw on failure instead of silently swallowing the error. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ation (no-changelog) Positional selectors (.last()) break when parallel tests create threads in shared containers. Switch to getThreadByTitle() with LLM-generated titles from recordings. Also handle missing expectations directories gracefully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-10T12:28:05Z

⚠️ Ownership acknowledgement required

Please add or check the following item in your PR description before this can be merged:

- [x] I have seen this code, I have run this code, and I take responsibility for this code.

codecov · 2026-04-10T12:31:21Z