Skip to content

fix(vertexai): prevent RuntimeError from stale client after startup event loop#6072

Open
goingforstudying-ctrl wants to merge 1 commit into
ogx-ai:mainfrom
goingforstudying-ctrl:fix/vertexai-client-event-loop-reset
Open

fix(vertexai): prevent RuntimeError from stale client after startup event loop#6072
goingforstudying-ctrl wants to merge 1 commit into
ogx-ai:mainfrom
goingforstudying-ctrl:fix/vertexai-client-event-loop-reset

Conversation

@goingforstudying-ctrl

Copy link
Copy Markdown

Ran into #6057 while setting up a VertexAI provider — the server would crash with RuntimeError: Event loop is closed on the first inference request after startup.

Turns out the issue is that during StackApp.__init__, the stack initialization runs in a temporary event loop, and refresh_registry_once() triggers model listing which calls _get_client() on the VertexAI adapter. The Google genai Client eagerly creates an httpx.AsyncClient internal to itself, binding it to that temporary loop. After the temp loop goes away and uvicorn starts on a fresh loop, the cached client is still holding connections tied to the dead loop.

Two things in this PR:

  1. Added _reset_client() on VertexAIInferenceAdapter — clears the cached default client and HTTP options. This is called from StackApp.__init__ right after reset_sqlstore_engines(), following the exact same pattern that already exists for SQL engines.

  2. Added a safety check in _get_client() itself — before returning the cached default client, it checks whether the underlying httpx transport has been closed. If it has (which happens when the event loop it was created on is terminated), it logs and recreates the client. This is defense-in-depth in case the reset isn't called.

Not entirely sure about the is_closed check — it relies on httpx's internal state tracking which seems stable across recent versions but could change. Happy to remove that part if you'd prefer to keep it simpler.

Test Plan

Ran python3.12 -m py_compile on both modified files — they compile cleanly. The existing test suite should cover the normal code paths since these changes only affect the initialization/recreation path. The event loop simulation is tricky to unit test without bringing up a full server, but the pattern mirrors the tested reset_sqlstore_engines() flow exactly.

@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/vertexai-client-event-loop-reset branch 8 times, most recently from 87be3ee to 6c519de Compare June 11, 2026 10:02
…r startup

During StackApp.__init__, stack.initialize() runs inside a temporary event
loop via ThreadPoolExecutor.  Model listing (refresh_registry_once) triggers
lazy VertexAI client creation which binds an internal httpx.AsyncClient to
the temporary loop.  When uvicorn later starts on a new loop, the cached
client causes 'RuntimeError: Event loop is closed' on the first inference
request.

Two fixes that work together:

1. Add _reset_client() to VertexAIInferenceAdapter — clears the cached
   default client and HTTP options after the temporary event loop exits.
   This follows the same pattern as reset_sqlstore_engines() for SQL
   engines.

2. Add defense-in-depth in _get_client() — before returning the cached
   default client, verify it is still usable by checking whether the
   underlying httpx transport has been closed.  If it has, recreate
   the client.

Fixes ogx-ai#6057

Signed-off-by: goingforstudying-ctrl <goingforstudying-ctrl@users.noreply.github.com>
@goingforstudying-ctrl goingforstudying-ctrl force-pushed the fix/vertexai-client-event-loop-reset branch from 6c519de to 79a086d Compare June 11, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant