fix(inference): AttributeError in streaming response cleanup #4236

r-bit-rry · 2025-11-26T08:57:38Z

This PR fixes issue #3185
The code calls await event_gen.aclose() but OpenAI's AsyncStream doesn't have an aclose() method - it has close() (which is async).
when clients cancel streaming requests, the server tries to clean up with:

await event_gen.aclose()  # ❌ AsyncStream doesn't have aclose()!

But AsyncStream has never had a public aclose() method. The error message literally tells us:

AttributeError: 'AsyncStream' object has no attribute 'aclose'. Did you mean: 'close'?
                                                                            ^^^^^^^^

Verification

Reproduction script reproduce_issue_3185.sh can be used to verify the fix.
Manual checks, validation against original OpenAI library code

mattf

@r-bit-rry a chain of hasattr like this suggests we've done something wrong in the design. have we or can we just call close?

r-bit-rry · 2025-11-27T06:48:15Z

@r-bit-rry a chain of hasattr like this suggests we've done something wrong in the design. have we or can we just call close?

It really comes down to what we want to support, since this was never strictly typed, I'm assuming there are other objects that can be generated by the sse_generator.
I'd go even a step further to ask why do we even assume close method exists?

and on a more serious note @mattf
PEP 525 defines aclose() for async generators
OpenAI and Anthropic SDKs deviate from the standard and use close() instead (while underneath the hood they both call httpx.Response.aclose() )
We cannot control third-party SDK design choices

so its our decision if we want to enforce certain typings, and act upon, or let this pattern "catch all"

mattf

as propose, the hasattr chain will cover up an api contract bug somewhere in the system.

an AsyncStream is making it to a place where only AsyncIterators should be.

i did a little sleuthing and i think there is a bug in at least _maybe_overwrite_id. there are multiple provider impls, so there may be others.

will you find the places where the api contract is being violated and patch them?

also, will you create a regression test that at least tests the openai mixin provider?

r-bit-rry · 2025-11-27T12:23:51Z

@mattf sure thing, I'll start working on those

r-bit-rry · 2025-11-27T15:38:47Z

@mattf We're facing two options in order to avoid hasattr chain as I see it when treating the AsyncStream object:
option 1: wrapping

async def wrap_async_stream(stream): 
    async for item in stream: yield item

this is explicit and simple but carries a small overhead per chunk

option 2: adapter pattern

class AsyncStreamAdapter(AsyncIterator[T]):
    def __init__(self, stream): self._stream = stream
    async def aclose(self): await self._stream.close()  # Delegate close→aclose

direct delegation with no re-yielding and a more explicit intent

regarding locations of violations, where we will need patching, these are the places I was able to spot:

_maybe_overwrite_id(): Location: src/llama_stack/providers/utils/inference/openai_mixin.py:251
when overwrite_completion_id=False and stream=True

returned AsyncStream (has close()) instead of AsyncIterator (has aclose())

PassthroughInferenceAdapter - openai_chat_completion()
Location: src/llama_stack/providers/remote/inference/passthrough/passthrough.py:123-136

Returned raw client.chat.completions.create() response

PassthroughInferenceAdapter - openai_completion()
Location: src/llama_stack/providers/remote/inference/passthrough/passthrough.py:108-121

Returned raw client.completions.create() response

LiteLLMOpenAIMixin - openai_chat_completion()
Location: src/llama_stack/providers/utils/inference/litellm_openai_mixin.py:222-278

Returned raw litellm.acompletion() result

LiteLLMOpenAIMixin - openai_completion()
Location: src/llama_stack/providers/utils/inference/litellm_openai_mixin.py:179-220
Returned raw litellm.atext_completion() result

mattf · 2025-11-27T16:00:07Z

@r-bit-rry great finds! it looks like we're violating the api contract and using # ignore to cover it up...resulting in the bug. what fix do you suggest?

r-bit-rry · 2025-11-30T08:50:45Z

@mattf I suggest using the first option, with direct declaration of the types, and changing the API contract for openai_completion, removing the #type: ignore comments.
since I'm already touching all the places where the return value of the method is handled, I think fixing the contract there is the most responsible way of handling this.

mattf · 2025-11-30T10:38:22Z

@r-bit-rry sounds good!

r-bit-rry · 2025-11-30T15:31:40Z

@mattf as I'm digging, I'm finding new stuff, apparently we have a discrepancy between official and internal documentation of openai_completion and the implementation in place:
Request schema from inference.py:871:
class OpenAICompletionRequestWithExtraBody(BaseModel, extra="allow"):
# ...
stream: bool | None = None # ← Streaming IS a parameter!

however the documentation both on our end and on openai official end does not support streaming to begin with (docs/docs/contributing/new_api_provider.mdx:40-43:

openai_completion(): Legacy text completion API with full parameter support
openai_chat_completion(): Chat completion API supporting streaming, tools, and function calling

OpenAI documentation:
Schema: CreateCompletionResponse
"Represents a completion response from the API. Note: both the streamed and non-streamed response objects share the same shape (unlike the chat endpoint)."
so only our documentation is lacking. I've taken this into the changes I've comitted

mattf

can we get rid of the type ignores now?

src/llama_stack_api/inference.py

mattf · 2025-12-01T15:52:30Z

can we get rid of the type ignores now?

Remaining type: ignore comments are only for external library issues:
type: ignore[arg-type] - LiteLLM streaming types don't match OpenAI
type: ignore[return-value] - External libs lack type stubs

what about passthrough?

r-bit-rry · 2025-12-07T14:59:54Z

can we get rid of the type ignores now?

Remaining type: ignore comments are only for external library issues:
type: ignore[arg-type] - LiteLLM streaming types don't match OpenAI
type: ignore[return-value] - External libs lack type stubs

what about passthrough?

regarding the passthrough type: ignore comments. those are actually for the non-streaming path, not the streaming fix.

The issue is just a type mismatch between what the OpenAI SDK returns (Completion/ChatCompletion) and what Llama Stack types expect (OpenAICompletion/OpenAIChatCompletion). Structurally they're the same, but as far as I understand mypy doesn't see them as identical.

The streaming path (wrap_async_stream(response)) doesn't have any type ignores and doesn't need one since wrap_async_stream properly returns AsyncIterator[T].

So removing these would need either explicit type conversions or aligning llamastack type definitions with the SDK types directly, I think this is out of scope for this fix.
Happy to tackle that separately if you think it's worth it though

mattf · 2025-12-09T15:40:54Z

can we get rid of the type ignores now?

Remaining type: ignore comments are only for external library issues:
type: ignore[arg-type] - LiteLLM streaming types don't match OpenAI
type: ignore[return-value] - External libs lack type stubs

what about passthrough?

regarding the passthrough type: ignore comments. those are actually for the non-streaming path, not the streaming fix.

The issue is just a type mismatch between what the OpenAI SDK returns (Completion/ChatCompletion) and what Llama Stack types expect (OpenAICompletion/OpenAIChatCompletion). Structurally they're the same, but as far as I understand mypy doesn't see them as identical.

The streaming path (wrap_async_stream(response)) doesn't have any type ignores and doesn't need one since wrap_async_stream properly returns AsyncIterator[T].

So removing these would need either explicit type conversions or aligning llamastack type definitions with the SDK types directly, I think this is out of scope for this fix. Happy to tackle that separately if you think it's worth it though

i see. pls file an issue about that so we can resolve it later.

r-bit-rry · 2025-12-11T09:16:52Z

@mattf I created #4369 per you request

github-actions · 2025-12-11T09:18:13Z

✱ Stainless preview builds

This PR will update the llama-stack-client SDKs with the following commit message.

fix(inference): AttributeError in streaming response cleanup

⚠️

llama-stack-client-node studio · code

There was a regression in your SDK.
generate ⚠️ → build ✅ → lint ✅ → test ✅
npm install https://pkg.stainless.com/s/llama-stack-client-node/b634edc8ffeed4f76fd9bd0eeedafd0f441a9dad/dist.tar.gz

⚠️

llama-stack-client-kotlin studio · code

There was a regression in your SDK.
generate ⚠️ → lint ✅ → test ❗

⚠️

llama-stack-client-python studio · conflict

There was a regression in your SDK.

⚠️

llama-stack-client-go studio · code

There was a regression in your SDK.
generate ❗ → lint ❗ → test ❗
go get github.com/stainless-sdks/llama-stack-client-go@703cd5b2f741deeb209cc8d1a039265ea58676e0

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
Last updated: 2025-12-14 12:58:02 UTC

…ack#4236) This PR fixes issue llamastack#3185 The code calls `await event_gen.aclose()` but OpenAI's `AsyncStream` doesn't have an `aclose()` method - it has `close()` (which is async). when clients cancel streaming requests, the server tries to clean up with: ```python await event_gen.aclose() # ❌ AsyncStream doesn't have aclose()! ``` But `AsyncStream` has never had a public `aclose()` method. The error message literally tells us: ``` AttributeError: 'AsyncStream' object has no attribute 'aclose'. Did you mean: 'close'? ^^^^^^^^ ``` ## Verification * Reproduction script [`reproduce_issue_3185.sh`](https://gist.github.com/r-bit-rry/dea4f8fbb81c446f5db50ea7abd6379b) can be used to verify the fix. * Manual checks, validation against original OpenAI library code

fix(server.py): check attr sse_generator returned object

57f8f6d

r-bit-rry requested review from ashwinb, bbrowning, cdoern, ehhuang, franciscojavierarceo, leseb, mattf and raghotham as code owners November 26, 2025 08:57

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 26, 2025

Merge branch 'main' into fix/issue-3185

9b3c041

mattf reviewed Nov 26, 2025

View reviewed changes

mattf requested changes Nov 27, 2025

View reviewed changes

r-bit-rry and others added 3 commits November 30, 2025 17:33

further fixes according to investigation and PR comments

c3c9edf

fix for the logger

dc4c7ea

Merge branch 'main' into fix/issue-3185

c262e5f

mattf requested changes Dec 1, 2025

View reviewed changes

src/llama_stack_api/inference.py Outdated Show resolved Hide resolved

r-bit-rry and others added 2 commits December 7, 2025 17:00

Merge branch 'main' into fix/issue-3185

7a7bfca

comment fixes

ae4e18d

mattf approved these changes Dec 9, 2025

View reviewed changes

Merge branch 'main' into fix/issue-3185

099f8dc

r-bit-rry changed the title ~~Fix AttributeError in streaming response cleanup~~ fix(inference): AttributeError in streaming response cleanup Dec 11, 2025

r-bit-rry mentioned this pull request Dec 11, 2025

Type annotations misalignment between Llama Stack and external SDK types (OpenAI/LiteLLM) #4369

Open

fix broken api declaration

e3d35da

r-bit-rry and others added 3 commits December 11, 2025 11:27

precommit fixes

31f2dd0

fix precommit hooks check

2753051

Merge branch 'main' into fix/issue-3185

3668b89

mattf merged commit c574db5 into llamastack:main Dec 14, 2025
38 checks passed

fix(inference): AttributeError in streaming response cleanup #4236

fix(inference): AttributeError in streaming response cleanup #4236

Uh oh!

Conversation

r-bit-rry commented Nov 26, 2025

Verification

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

r-bit-rry commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

r-bit-rry commented Nov 27, 2025

Uh oh!

r-bit-rry commented Nov 27, 2025

Uh oh!

mattf commented Nov 27, 2025

Uh oh!

r-bit-rry commented Nov 30, 2025

Uh oh!

mattf commented Nov 30, 2025

Uh oh!

r-bit-rry commented Nov 30, 2025

Uh oh!

mattf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattf commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

r-bit-rry commented Dec 7, 2025

Uh oh!

mattf commented Dec 9, 2025

Uh oh!

r-bit-rry commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

r-bit-rry commented Nov 27, 2025 •

edited

Loading

mattf commented Dec 1, 2025 •

edited

Loading

github-actions bot commented Dec 11, 2025 •

edited

Loading