Skip to content

test(editor): Add comprehensive instance AI e2e tests#28326

Draft
mutdmour wants to merge 19 commits intomasterfrom
feature/instance-ai-tabs
Draft

test(editor): Add comprehensive instance AI e2e tests#28326
mutdmour wants to merge 19 commits intomasterfrom
feature/instance-ai-tabs

Conversation

@mutdmour
Copy link
Copy Markdown
Contributor

Summary

  • Add 16 e2e tests covering instance AI chat, sidebar, artifacts, confirmations, timeline, and workflow preview
  • Add trace replay infrastructure for deterministic LLM response replay in CI (no API key needed)
  • Add proxy retry with exponential backoff to handle MockServer ECONNRESET under parallel load
  • Fix cross-test thread contamination by using identity-based (title) lookups instead of positional selectors
  • Fix execution event relay ordering bug (executionFinished before pending events)
  • Polish instance AI artifact preview tabs

Test plan

  • All 16 instance AI e2e tests pass locally (verified with multiple consecutive runs)
  • Tests are parallel-safe — no cross-test contamination
  • Proxy ECONNRESET handled with retry + exponential backoff
  • Rebased on latest master and verified green

🤖 Generated with Claude Code

mutdmour and others added 15 commits April 10, 2026 14:09
Remove type icons from tab labels, make tabs fill full header height,
and replace loading spinners with larger loader-circle icon (80px).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…g (no-changelog)

Adds a Playwright e2e test that captures the bug where the last node in
the instance AI workflow preview stays in "running" state (spinning
border) after execution completes. The test sends a specific prompt to
build and execute a 3-node workflow, then asserts that no canvas nodes
remain with the .running CSS class.

Includes InstanceAiPage page object, navigation helper, and test
fixtures with N8N_ENABLED_MODULES=instance-ai.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ionFinished (no-changelog)

The event relay watcher only forwarded the last event in the log, so when
Vue coalesced multiple ref updates into one callback, intermediate events
(e.g. nodeExecuteAfter for the last node) were silently dropped. This left
the iframe's executing-node queue with a stale entry, keeping the last node
in spinning/running state after the workflow finished.

- Track relayed event count so every new event is forwarded, even when the
  watcher fires once for multiple log additions.
- Keep the eventLog intact when executionFinished arrives (instead of
  clearing it immediately) so the relay can forward pending events before
  sending the synthetic executionFinished.
- Add clearEventLog() to useExecutionPushEvents, called by the relay after
  all pending events have been forwarded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 16 Playwright e2e tests across 6 spec files covering instance AI
workflow preview, artifacts, timeline, sidebar, confirmations, and
chat basics. Wire up proxy-aware fetch in the AI SDK model creation
so MockServer can intercept Anthropic API calls for recording/replay.

- Expand InstanceAiPage page object with 30+ locators
- Add InstanceAiSidebar component page object
- Add data-test-id to preview close button
- Add getProxyFetch() to model-factory.ts and instance-ai.service.ts
  so @ai-sdk/anthropic respects HTTP_PROXY in e2e containers
- Rewrite fixtures with proxy service recording support
- Replace single execution-state test with comprehensive suite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…hangelog)

Add two-tier trace replay system that records tool I/O during e2e test
recording and replays with bidirectional ID remapping in CI. This enables
deterministic replay of complex multi-step agent tests where tool execution
produces dynamic IDs.

- New trace-replay.ts: IdRemapper (ID-field-aware), TraceIndex (per-role
  cursors), TraceWriter, JSONL I/O helpers, PURE_REPLAY_TOOLS set
- Modified langsmith-tracing.ts: replayWrapTool (Tier 1: real execution +
  ID remap), pureReplayWrapTool (Tier 2: pure replay for external deps),
  recordWrapTool, createTraceReplayOnlyContext stub for non-LangSmith envs
- New test-only controller endpoints: POST/GET/DELETE /test/tool-trace
  with slug-scoped storage for parallel test isolation
- Updated fixture: records trace.jsonl during recording, loads for replay,
  slug-scoped activate/retrieve/clear lifecycle
- 23 unit tests for IdRemapper and TraceIndex
- Recorded trace.jsonl files for all 15 instance AI test expectations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…log)

Add subString body matching on the system prompt to disambiguate LLM call
types (title generation vs orchestrator vs sub-agent) during proxy replay.
Without this, sequential expectations could be served to the wrong call
when the call order differs between recording and replay.

Re-record all expectations with the body matcher and remove debug logging
from trace replay wrappers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When re-recording with a real API key, always use record mode (never
load old trace events into the backend). Previously, existing trace
files would cause the backend to enter replay mode during re-recording,
resulting in trace.jsonl files with only a header and no tool calls.

Re-record all trace.jsonl files with proper tool call events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-record all proxy expectations after fixing the recording mode logic.
Expectations now have subString body matchers on the system prompt and
trace.jsonl files have proper tool call events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The proxy's sequential mode sets the last expectation as unlimited (fallback
for extra agent turns). Previously this applied to the last file alphabetically
which could be a community_nodes GET. Now it finds the last /v1/messages
POST expectation specifically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…elog)

Background task completion triggers `startInternalFollowUpRun`, which
creates a new trace context. Previously each context got a fresh
TraceIndex with cursor at 0, so the follow-up run's first tool call
(e.g. list-workflows) would mismatch the first trace event
(build-workflow-with-agent) and throw.

Fix: store a shared TraceIndex/IdRemapper per test slug on the service.
All runs within the same slug reuse the same instances, preserving
cursor state across the initial run and any follow-up runs.

This fixes the two confirmation e2e tests that rely on suspend/resume.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…(no-changelog)

waitForAssistantResponse only waited for the first message element to appear
(streaming start), not for the agent to finish. Sidebar operations then raced
against the still-running agent. New waitForResponseComplete waits for the send
button to reappear, which only renders when isStreaming becomes false.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s between tests (no-changelog)

Two preview tests failed because their recorded proxy expectations contained
stale LLM responses from previous tests' background task follow-ups. The
fixture now cancels leftover background tasks before each test via a new
test-only endpoint, preventing future cross-test contamination.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ET (no-changelog)

MockServer proxy connections intermittently reset when 4 parallel workers
load expectations simultaneously. Add withRetry helper with exponential
backoff (3 retries, 500ms base) and re-throw on failure instead of
silently swallowing the error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation (no-changelog)

Positional selectors (.last()) break when parallel tests create threads in
shared containers. Switch to getThreadByTitle() with LLM-generated titles
from recordings. Also handle missing expectations directories gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

⚠️ Ownership acknowledgement required

Please add or check the following item in your PR description before this can be merged:

- [x] I have seen this code, I have run this code, and I take responsibility for this code.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

❌ 5 Tests Failed:

Tests completed Failed Passed Skipped
45956 5 45951 2
View the full list of 5 ❄️ flaky test(s)
src/features/ai/instanceAi/__tests__/composableIntegration.test.ts > composable integration > build → execute → edit workflow > new execution after edit is tracked independently

Flake rate in main: 100.00% (Passed 0 times, Failed 1 times)

Stack Traces | 0.0013s run time
AssertionError: expected 0 to be greater than 0
 ❯ .../instanceAi/__tests__/composableIntegration.test.ts:735:35
src/features/ai/instanceAi/__tests__/composableIntegration.test.ts > composable integration > multi-step scenarios > execution on inactive workflow — no relay until tab switch + iframe ready

Flake rate in main: 100.00% (Passed 0 times, Failed 1 times)

Stack Traces | 0.00288s run time
AssertionError: expected 2 to be +0 // Object.is equality

- Expected
+ Received

- 0
+ 2

 ❯ .../instanceAi/__tests__/composableIntegration.test.ts:513:35
src/features/ai/instanceAi/__tests__/composableIntegration.test.ts > composable integration > multi-step scenarios > rapid re-execution: relay tracks latest execution only

Flake rate in main: 100.00% (Passed 0 times, Failed 1 times)

Stack Traces | 0.0012s run time
AssertionError: expected 1 to be greater than 1
 ❯ .../instanceAi/__tests__/composableIntegration.test.ts:580:35
src/features/ai/instanceAi/__tests__/composableIntegration.test.ts > composable integration > multi-step scenarios > workflow + data table interleaved — no relay for data table tab

Flake rate in main: 100.00% (Passed 0 times, Failed 1 times)

Stack Traces | 0.00129s run time
AssertionError: expected 2 to be +0 // Object.is equality

- Expected
+ Received

- 0
+ 2

 ❯ .../instanceAi/__tests__/composableIntegration.test.ts:563:35
src/features/ai/instanceAi/__tests__/composableIntegration.test.ts > composable integration > re-execution clears stale state > re-execution relays events to iframe when execution results are already showing

Flake rate in main: 100.00% (Passed 0 times, Failed 1 times)

Stack Traces | 0.00908s run time
AssertionError: expected 0 to be greater than 0
 ❯ .../instanceAi/__tests__/composableIntegration.test.ts:150:35

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Bundle Report

Changes will increase total bundle size by 1.03kB (0.0%) ⬆️. This is within the configured threshold ✅

Detailed changes
Bundle name Size Change
editor-ui-esm 45.57MB 1.03kB (0.0%) ⬆️

Affected Assets, Files, and Routes:

view changes for bundle: editor-ui-esm

Assets Changed:

Asset Name Size Change Total Size Change (%)
assets/worker-*.js 3.14MB 3.16MB 17560.39% ⚠️
assets/worker-*.js -3.14MB 17.9kB -99.43%
assets/src-*.js 604 bytes 2.43MB 0.02%
assets/InstanceAiView-*.js 468 bytes 318.79kB 0.15%
assets/InstanceAiView-*.css -108 bytes 154.71kB -0.07%
assets/usePushConnection-*.js 65 bytes 31.09kB 0.21%

Files in assets/InstanceAiView-*.js:

  • ./src/features/ai/instanceAi/components/InstanceAiWorkflowPreview.vue → Total Size: 394 bytes

  • ./src/features/ai/instanceAi/InstanceAiView.vue → Total Size: 335 bytes

  • ./src/features/ai/instanceAi/components/InstanceAiDataTablePreview.vue → Total Size: 398 bytes

  • ./src/features/ai/instanceAi/components/InstanceAiPreviewTabBar.vue → Total Size: 386 bytes

  • ./src/features/ai/instanceAi/useEventRelay.ts → Total Size: 1.29kB

  • ./src/features/ai/instanceAi/useExecutionPushEvents.ts → Total Size: 2.61kB

Files in assets/usePushConnection-*.js:

  • ./src/app/composables/usePushConnection/handlers/executionFinished.ts → Total Size: 12.83kB

mutdmour and others added 2 commits April 10, 2026 15:41
…ure (no-changelog)

Covers the record/replay architecture, ID remapping problem and solution,
two-tier tool wrapping strategy, trace format, and troubleshooting guide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ant proxy expectations (no-changelog)

- Add unit tests for TraceWriter, parseTraceJsonl, model-factory proxy fetch,
  clearEventLog, and useEventRelay coalesced event handling
- Extract test-only trace replay endpoints into InstanceAiTestController,
  conditionally registered when N8N_INSTANCE_AI_TRACE_REPLAY is set
- Extract trace replay state from InstanceAiService into TraceReplayState class
- Remove 83 irrelevant api-staging community nodes expectation files
- Fix stale test that expected eventLog cleared on executionFinished

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

! PR exceeds size limit (1,332 lines added)

This PR adds 1,332 lines, exceeding the 1,000-line limit (test files excluded).

Large PRs are harder to review and increase the risk of bugs going unnoticed. Please consider:

  • Breaking this into smaller, logically separate PRs
  • Moving unrelated changes to a follow-up PR

If the size is genuinely justified (e.g. generated code, large migrations, test fixtures), a maintainer can override by commenting /size-limit-override and then pushing a new commit or re-running this check.

…tance AI e2e tests (no-changelog)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 10, 2026

Performance Comparison

Comparing currentlatest master14-day baseline

Idle baseline with Instance AI module loaded

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
instance-ai-heap-used-baseline 186.20 MB 186.51 MB 186.46 MB (< 3 samples) -0.2% -0.1%
instance-ai-rss-baseline 343.26 MB 394.55 MB 369.15 MB (< 3 samples) -13.0% -7.0%

docker-stats

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
docker-image-size-runners 386.00 MB 386.00 MB 387.50 MB (σ 3.00) +0.0% -0.4%
docker-image-size-n8n 1269.76 MB 1269.76 MB 1269.76 MB (σ 0.00) +0.0% +0.0%

Memory consumption baseline with starter plan resources

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
memory-heap-used-baseline 113.75 MB 114.53 MB 113.09 MB (σ 1.15) -0.7% +0.6%
memory-rss-baseline 333.04 MB 287.07 MB 281.78 MB (σ 34.50) +16.0% +18.2% ⚠️
How to read this table
  • Current: This PR's value (or latest master if PR perf tests haven't run)
  • Latest Master: Most recent nightly master measurement
  • Baseline: Rolling 14-day average from master
  • vs Master: PR impact (current vs latest master)
  • vs Baseline: Drift from baseline (current vs rolling avg)
  • Status: ✅ within 1σ | ⚠️ 1-2σ | 🔴 >2σ regression

…laims (no-changelog)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Apr 10, 2026
@mutdmour
Copy link
Copy Markdown
Contributor Author

/size-limit-override

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant