GPT-5.1 / GPT-5.4 /responses endpoint silently hangs 10-28% of concurrent requests

## Summary

The `/v1/responses` endpoint silently drops 10–28% of requests under moderate concurrent load (5 simultaneous calls). Affected requests receive no HTTP error, no timeout, and no retry from the SDK — the TCP connection simply hangs indefinitely until the caller kills it. This has been reproducible across multiple hours of testing on April 3, 2026.

**This is NOT an SDK issue.** We tested raw `AsyncOpenAI.responses.create()` (zero Agents SDK involvement) side-by-side with Agents SDK `Runner.run()` — both hang at the same rate.

## Environment

- `openai` Python SDK: 2.29.0
- `openai-agents`: 0.12.5
- Python 3.12.3
- Region: US (Oracle Cloud, SJC — confirmed via `cf-ray` headers)
- No custom middleware, proxies, or wrappers

## Reproducible Test Script

```python
import asyncio, time
from openai import AsyncOpenAI
from agents import Agent, Runner
from agents.models.openai_responses import OpenAIResponsesModel

API_KEY = "your-key-here"
MODEL = "gpt-5.1"  # also reproduces with gpt-5.4-2026-03-05
TIMEOUT = 60

client = AsyncOpenAI(api_key=API_KEY)

async def call(i):
    agent = Agent(
        name=f"T{i}",
        model=OpenAIResponsesModel(model=MODEL, openai_client=client),
        instructions="Be brief.",
    )
    start = time.time()
    try:
        r = await asyncio.wait_for(
            Runner.run(agent, input=f"Reply with only the number {i}."),
            timeout=TIMEOUT,
        )
        e = time.time() - start
        print(f"  {i}: OK {e:.2f}s - {r.final_output[:30]}")
        return ("ok", e)
    except asyncio.TimeoutError:
        print(f"  {i}: HUNG after {time.time()-start:.2f}s")
        return ("hung", 60)
    except Exception as ex:
        print(f"  {i}: ERR {time.time()-start:.2f}s - {type(ex).__name__}")
        return ("error", time.time() - start)

async def main():
    print(f"10 rounds x 5 concurrent = 50 calls to {MODEL}\n")
    all_r = []
    for r in range(1, 11):
        print(f"--- Round {r}/10 ---")
        results = await asyncio.gather(*[call(r * 10 + i) for i in range(5)])
        all_r.extend(results)
        await asyncio.sleep(0.3)
    ok = sum(1 for s, _ in all_r if s == "ok")
    hung = sum(1 for s, _ in all_r if s == "hung")
    times = [t for s, t in all_r if s == "ok"]
    print(f"\nRESULTS: {ok} ok / {hung} hung")
    print(f"Latency: avg={sum(times)/len(times):.2f}s max={max(times):.2f}s")

asyncio.run(main())
```

## Results Across 4 Runs

| Run | Model | OK | Hung | Hung % | Avg Latency | Max Latency |
|-----|-------|----|------|--------|-------------|-------------|
| 1 | gpt-5.1 | 46/50 | 4 | 8% | 2.42s | 39.4s |
| 2 | gpt-5.1 | 45/50 | 5 | 10% | 2.85s | 48.9s |
| 3 | gpt-5.4-2026-03-05 | 42/50 | 8 | 16% | 2.63s | 38.7s |
| 4 | gpt-5.1 | 36/50 | 14 | **28%** | 3.74s | 32.7s |

## Raw Client vs Agents SDK Comparison

To rule out the Agents SDK, we ran 50 calls each through raw `client.responses.create()` and Agents SDK `Runner.run()`:

| Method | OK | Hung | Hung % | Avg Latency |
|--------|----|------|--------|-------------|
| Raw `AsyncOpenAI` (no SDK) | 40/50 | **10** | 20% | 4.62s |
| Agents SDK `Runner.run()` | 38/50 | **12** | 24% | 9.94s |

Both hang at the same rate. The issue is in the `/v1/responses` endpoint, not the SDK.

## Behavior of Hung Requests

- The HTTP connection is accepted by OpenAI (TCP handshake completes)
- No HTTP response is ever returned — no status code, no error, no body
- The OpenAI Python SDK does **not** retry because no error is received
- The request hangs until the caller's timeout kills it
- Successful requests from the same batch return in <1s
- `cf-ray` headers confirm requests hit Cloudflare/SJC — this is server-side

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-5.1 / GPT-5.4 /responses endpoint silently hangs 10-28% of concurrent requests #2838

Summary

Environment

Reproducible Test Script

Results Across 4 Runs

Raw Client vs Agents SDK Comparison

Behavior of Hung Requests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Run	Model	OK	Hung	Hung %	Avg Latency	Max Latency
1	gpt-5.1	46/50	4	8%	2.42s	39.4s
2	gpt-5.1	45/50	5	10%	2.85s	48.9s
3	gpt-5.4-2026-03-05	42/50	8	16%	2.63s	38.7s
4	gpt-5.1	36/50	14	28%	3.74s	32.7s

Method	OK	Hung	Hung %	Avg Latency
Raw `AsyncOpenAI` (no SDK)	40/50	10	20%	4.62s
Agents SDK `Runner.run()`	38/50	12	24%	9.94s

GPT-5.1 / GPT-5.4 /responses endpoint silently hangs 10-28% of concurrent requests #2838

Description

Summary

Environment

Reproducible Test Script

Results Across 4 Runs

Raw Client vs Agents SDK Comparison

Behavior of Hung Requests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions