Skip to content

ITK: v0.3/v1.0 SSE streaming wire format incompatibility #476

@zeroasterisk

Description

@zeroasterisk

Problem

The ITK test runner and v1.0 agents use incompatible SSE wire formats, which means streaming interop tests are not actually validating SSE event structure.

Test runner (pydantic SDK, a2a-sdk==0.3.25)

Expects flat events with kind discriminator and lowercase state enums:

data: {"kind": "status-update", "taskId": "tsk-1", "status": {"state": "working"}}

data: {"kind": "artifact-update", "taskId": "tsk-1", "artifact": {"parts": [{"kind": "text", "text": "hello"}]}}

Python v1.0 agent (protobuf SDK, a2a-sdk==1.0.0a0)

Emits StreamResponse wrapper with oneof payload and SCREAMING_SNAKE_CASE enums:

data: {"result": {"statusUpdate": {"taskId": "tsk-1", "status": {"state": "TASK_STATE_WORKING"}}}}

data: {"result": {"artifactUpdate": {"taskId": "tsk-1", "artifact": {"parts": [{"text": "hello"}]}}}}

Differences

Aspect pydantic SDK (v0.3) protobuf SDK (v1.0)
Event wrapper flat, kind discriminator StreamResponse.result oneof
State enum working, completed TASK_STATE_WORKING, TASK_STATE_COMPLETED
Role enum agent, user ROLE_AGENT, ROLE_USER
Part format {"kind": "text", "text": "..."} {"text": "..."} (no kind)

Impact

  1. No single SSE format satisfies both consumers. An agent emitting pydantic format works for the test runner but not for v1.0 agents receiving streaming callbacks in a multi-hop chain. An agent emitting protobuf format interops with v1.0 agents but the test runner may fail to parse.

  2. Streaming tests pass for the wrong reason. The v10-core-streaming tests pass because token verification checks the final concatenated result, not individual SSE events. The ITK is not validating that SSE events conform to the v1.0 wire format.

  3. Agents must hide streaming capability. When adding the Elixir agent (PR feat: add Elixir A2A SDK (actioncard/a2a-elixir) to ITK #475), we had to intentionally not advertise capabilities.streaming=true to avoid parse failures during multi-hop callbacks where different SDKs handle the downstream response.

Suggested fixes

  1. Upgrade the ITK test runner to use a2a-sdk>=1.0.0a0 (protobuf SDK) — aligns with v1.0 agents natively
  2. Add format detection in the test runner's SSE parser to handle both pydantic and protobuf event formats
  3. Separate v0.3 and v1.0 streaming tests so each uses the matching SDK for parsing
  4. Add SSE event structure assertions to streaming tests — currently only the final result is checked

Reproduction

  1. Run v10-core-streaming test (Python v1.0 + Go v1.0)
  2. Capture the actual SSE events emitted by each agent
  3. Compare against what testlib.py can parse
  4. Note: tests pass but individual events may not parse correctly

Context

Discovered while adding the Elixir v1.0 agent to the ITK (PR #475). The Elixir agent implements SSE streaming but cannot advertise it due to this format mismatch.

cc @kdziedzic70

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions