Skip to content

GPT-5.1 / GPT-5.4 /responses endpoint silently hangs 10-28% of concurrent requests #2838

@KshitizHelix

Description

@KshitizHelix

Summary

The /v1/responses endpoint silently drops 10–28% of requests under moderate concurrent load (5 simultaneous calls). Affected requests receive no HTTP error, no timeout, and no retry from the SDK — the TCP connection simply hangs indefinitely until the caller kills it. This has been reproducible across multiple hours of testing on April 3, 2026.

This is NOT an SDK issue. We tested raw AsyncOpenAI.responses.create() (zero Agents SDK involvement) side-by-side with Agents SDK Runner.run() — both hang at the same rate.

Environment

  • openai Python SDK: 2.29.0
  • openai-agents: 0.12.5
  • Python 3.12.3
  • Region: US (Oracle Cloud, SJC — confirmed via cf-ray headers)
  • No custom middleware, proxies, or wrappers

Reproducible Test Script

import asyncio, time
from openai import AsyncOpenAI
from agents import Agent, Runner
from agents.models.openai_responses import OpenAIResponsesModel

API_KEY = "your-key-here"
MODEL = "gpt-5.1"  # also reproduces with gpt-5.4-2026-03-05
TIMEOUT = 60

client = AsyncOpenAI(api_key=API_KEY)

async def call(i):
    agent = Agent(
        name=f"T{i}",
        model=OpenAIResponsesModel(model=MODEL, openai_client=client),
        instructions="Be brief.",
    )
    start = time.time()
    try:
        r = await asyncio.wait_for(
            Runner.run(agent, input=f"Reply with only the number {i}."),
            timeout=TIMEOUT,
        )
        e = time.time() - start
        print(f"  {i}: OK {e:.2f}s - {r.final_output[:30]}")
        return ("ok", e)
    except asyncio.TimeoutError:
        print(f"  {i}: HUNG after {time.time()-start:.2f}s")
        return ("hung", 60)
    except Exception as ex:
        print(f"  {i}: ERR {time.time()-start:.2f}s - {type(ex).__name__}")
        return ("error", time.time() - start)

async def main():
    print(f"10 rounds x 5 concurrent = 50 calls to {MODEL}\n")
    all_r = []
    for r in range(1, 11):
        print(f"--- Round {r}/10 ---")
        results = await asyncio.gather(*[call(r * 10 + i) for i in range(5)])
        all_r.extend(results)
        await asyncio.sleep(0.3)
    ok = sum(1 for s, _ in all_r if s == "ok")
    hung = sum(1 for s, _ in all_r if s == "hung")
    times = [t for s, t in all_r if s == "ok"]
    print(f"\nRESULTS: {ok} ok / {hung} hung")
    print(f"Latency: avg={sum(times)/len(times):.2f}s max={max(times):.2f}s")

asyncio.run(main())

Results Across 4 Runs

Run Model OK Hung Hung % Avg Latency Max Latency
1 gpt-5.1 46/50 4 8% 2.42s 39.4s
2 gpt-5.1 45/50 5 10% 2.85s 48.9s
3 gpt-5.4-2026-03-05 42/50 8 16% 2.63s 38.7s
4 gpt-5.1 36/50 14 28% 3.74s 32.7s

Raw Client vs Agents SDK Comparison

To rule out the Agents SDK, we ran 50 calls each through raw client.responses.create() and Agents SDK Runner.run():

Method OK Hung Hung % Avg Latency
Raw AsyncOpenAI (no SDK) 40/50 10 20% 4.62s
Agents SDK Runner.run() 38/50 12 24% 9.94s

Both hang at the same rate. The issue is in the /v1/responses endpoint, not the SDK.

Behavior of Hung Requests

  • The HTTP connection is accepted by OpenAI (TCP handshake completes)
  • No HTTP response is ever returned — no status code, no error, no body
  • The OpenAI Python SDK does not retry because no error is received
  • The request hangs until the caller's timeout kills it
  • Successful requests from the same batch return in <1s
  • cf-ray headers confirm requests hit Cloudflare/SJC — this is server-side

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionQuestion about using the SDK

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions