fix(vertexai): prevent RuntimeError from stale client after startup event loop#6072
Open
goingforstudying-ctrl wants to merge 1 commit into
Open
Conversation
87be3ee to
6c519de
Compare
…r startup During StackApp.__init__, stack.initialize() runs inside a temporary event loop via ThreadPoolExecutor. Model listing (refresh_registry_once) triggers lazy VertexAI client creation which binds an internal httpx.AsyncClient to the temporary loop. When uvicorn later starts on a new loop, the cached client causes 'RuntimeError: Event loop is closed' on the first inference request. Two fixes that work together: 1. Add _reset_client() to VertexAIInferenceAdapter — clears the cached default client and HTTP options after the temporary event loop exits. This follows the same pattern as reset_sqlstore_engines() for SQL engines. 2. Add defense-in-depth in _get_client() — before returning the cached default client, verify it is still usable by checking whether the underlying httpx transport has been closed. If it has, recreate the client. Fixes ogx-ai#6057 Signed-off-by: goingforstudying-ctrl <goingforstudying-ctrl@users.noreply.github.com>
6c519de to
79a086d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ran into #6057 while setting up a VertexAI provider — the server would crash with
RuntimeError: Event loop is closedon the first inference request after startup.Turns out the issue is that during
StackApp.__init__, the stack initialization runs in a temporary event loop, andrefresh_registry_once()triggers model listing which calls_get_client()on the VertexAI adapter. The Google genaiClienteagerly creates anhttpx.AsyncClientinternal to itself, binding it to that temporary loop. After the temp loop goes away and uvicorn starts on a fresh loop, the cached client is still holding connections tied to the dead loop.Two things in this PR:
Added
_reset_client()onVertexAIInferenceAdapter— clears the cached default client and HTTP options. This is called fromStackApp.__init__right afterreset_sqlstore_engines(), following the exact same pattern that already exists for SQL engines.Added a safety check in
_get_client()itself — before returning the cached default client, it checks whether the underlying httpx transport has been closed. If it has (which happens when the event loop it was created on is terminated), it logs and recreates the client. This is defense-in-depth in case the reset isn't called.Not entirely sure about the
is_closedcheck — it relies on httpx's internal state tracking which seems stable across recent versions but could change. Happy to remove that part if you'd prefer to keep it simpler.Test Plan
Ran
python3.12 -m py_compileon both modified files — they compile cleanly. The existing test suite should cover the normal code paths since these changes only affect the initialization/recreation path. The event loop simulation is tricky to unit test without bringing up a full server, but the pattern mirrors the testedreset_sqlstore_engines()flow exactly.