-
Notifications
You must be signed in to change notification settings - Fork 3.4k
GPT-5.1 / GPT-5.4 /responses endpoint silently hangs 10-28% of concurrent requests #2838
Copy link
Copy link
Open
Labels
questionQuestion about using the SDKQuestion about using the SDK
Description
Summary
The /v1/responses endpoint silently drops 10–28% of requests under moderate concurrent load (5 simultaneous calls). Affected requests receive no HTTP error, no timeout, and no retry from the SDK — the TCP connection simply hangs indefinitely until the caller kills it. This has been reproducible across multiple hours of testing on April 3, 2026.
This is NOT an SDK issue. We tested raw AsyncOpenAI.responses.create() (zero Agents SDK involvement) side-by-side with Agents SDK Runner.run() — both hang at the same rate.
Environment
openaiPython SDK: 2.29.0openai-agents: 0.12.5- Python 3.12.3
- Region: US (Oracle Cloud, SJC — confirmed via
cf-rayheaders) - No custom middleware, proxies, or wrappers
Reproducible Test Script
import asyncio, time
from openai import AsyncOpenAI
from agents import Agent, Runner
from agents.models.openai_responses import OpenAIResponsesModel
API_KEY = "your-key-here"
MODEL = "gpt-5.1" # also reproduces with gpt-5.4-2026-03-05
TIMEOUT = 60
client = AsyncOpenAI(api_key=API_KEY)
async def call(i):
agent = Agent(
name=f"T{i}",
model=OpenAIResponsesModel(model=MODEL, openai_client=client),
instructions="Be brief.",
)
start = time.time()
try:
r = await asyncio.wait_for(
Runner.run(agent, input=f"Reply with only the number {i}."),
timeout=TIMEOUT,
)
e = time.time() - start
print(f" {i}: OK {e:.2f}s - {r.final_output[:30]}")
return ("ok", e)
except asyncio.TimeoutError:
print(f" {i}: HUNG after {time.time()-start:.2f}s")
return ("hung", 60)
except Exception as ex:
print(f" {i}: ERR {time.time()-start:.2f}s - {type(ex).__name__}")
return ("error", time.time() - start)
async def main():
print(f"10 rounds x 5 concurrent = 50 calls to {MODEL}\n")
all_r = []
for r in range(1, 11):
print(f"--- Round {r}/10 ---")
results = await asyncio.gather(*[call(r * 10 + i) for i in range(5)])
all_r.extend(results)
await asyncio.sleep(0.3)
ok = sum(1 for s, _ in all_r if s == "ok")
hung = sum(1 for s, _ in all_r if s == "hung")
times = [t for s, t in all_r if s == "ok"]
print(f"\nRESULTS: {ok} ok / {hung} hung")
print(f"Latency: avg={sum(times)/len(times):.2f}s max={max(times):.2f}s")
asyncio.run(main())Results Across 4 Runs
| Run | Model | OK | Hung | Hung % | Avg Latency | Max Latency |
|---|---|---|---|---|---|---|
| 1 | gpt-5.1 | 46/50 | 4 | 8% | 2.42s | 39.4s |
| 2 | gpt-5.1 | 45/50 | 5 | 10% | 2.85s | 48.9s |
| 3 | gpt-5.4-2026-03-05 | 42/50 | 8 | 16% | 2.63s | 38.7s |
| 4 | gpt-5.1 | 36/50 | 14 | 28% | 3.74s | 32.7s |
Raw Client vs Agents SDK Comparison
To rule out the Agents SDK, we ran 50 calls each through raw client.responses.create() and Agents SDK Runner.run():
| Method | OK | Hung | Hung % | Avg Latency |
|---|---|---|---|---|
Raw AsyncOpenAI (no SDK) |
40/50 | 10 | 20% | 4.62s |
Agents SDK Runner.run() |
38/50 | 12 | 24% | 9.94s |
Both hang at the same rate. The issue is in the /v1/responses endpoint, not the SDK.
Behavior of Hung Requests
- The HTTP connection is accepted by OpenAI (TCP handshake completes)
- No HTTP response is ever returned — no status code, no error, no body
- The OpenAI Python SDK does not retry because no error is received
- The request hangs until the caller's timeout kills it
- Successful requests from the same batch return in <1s
cf-rayheaders confirm requests hit Cloudflare/SJC — this is server-side
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionQuestion about using the SDKQuestion about using the SDK