-
Notifications
You must be signed in to change notification settings - Fork 61
Stream LLM logs and stats updates via WebSocket for RemoteConversation #1159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements Solution 1 from issue #1158 to make LLM completion logs accessible when using RemoteConversation with DockerWorkspace. Changes: - Add LLMCompletionLogEvent to stream log data from server to client - Add log_callback mechanism to Telemetry class - Configure EventService to emit LLMCompletionLogEvent when logging enabled - Handle LLMCompletionLogEvent in RemoteConversation to write logs client-side - Add tests for LLMCompletionLogEvent serialization When log_completions=True in remote execution context, logs are now sent as events through the WebSocket connection and written to the client filesystem instead of being trapped in the Docker container. Co-authored-by: openhands <[email protected]>
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This commit extends the streaming solution to fix issue #1087 where stats updates were not visible during RemoteConversation execution. Changes: - Added stats_update_callback mechanism to Telemetry class - Telemetry now triggers callback after metrics updates in on_response() - EventService sets up callback to emit ConversationStateUpdateEvent - Updated _publish_state_update to support selective field updates - Added comprehensive tests for callback functionality This follows the same pattern as LLM log streaming (issue #1158), providing a unified solution for streaming server-side data to RemoteConversation clients. Co-authored-by: openhands <[email protected]>
| # First, try to match by usage_id if we can determine it | ||
| # Since we don't have usage_id in the event, we'll use the first | ||
| # matching log folder, or fall back to a default | ||
| if self._log_completion_folders: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands So we don’t have usage_id here? This code doesn’t look right: so why did we store folders per usage_id if we can’t use that to identify them
Look at the whole diff of this PR, and maybe the code in event_service.py under the comment:
# Set callback for all LLMs in the agent that have logging enabled
We can access usage_id field of each LLM there. Could you analyze and tell me if we can fix the code so we can maybe send usage_id in the event, or some other way to recreate the folders correctly on client machine? Think deeply and only answer, do not make modifications. Understand the question first, and feel free to tell me if it’s valid or not, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! enyst can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of analysis and answer
-
Your concern is valid: the current streaming path doesn’t include usage_id, so the client can’t reliably select the correct per-usage_id folder. RemoteConversation builds a usage_id -> folder map, but since LLMCompletionLogEvent lacks usage_id, it falls back to the “first configured folder,” which defeats the design.
-
We can fix this cleanly:
- Add usage_id to LLMCompletionLogEvent.
- In event_service.py under “Set callback for all LLMs…”, when iterating agent.get_all_llms(), capture llm.usage_id in a closure and include it when emitting the event. The telemetry log_callback already provides filename and log_data; usage_id can be supplied from the enclosing scope.
- On the client, RemoteConversation should use event.usage_id to pick the correct directory from self._log_completion_folders[event.usage_id], with a small fallback if needed.
-
Rationale:
- We do have access to llm.usage_id at the point where the server registers callbacks, so adding it to the event is straightforward and avoids guessing from model_name.
- This preserves the intended per-usage_id directory structure and works with multiple LLMs or identical model names.
No code changes were made; this is a recommendation for how to adjust the PR to correctly propagate usage_id and restore proper folder routing on the client.
|
@xingyaoww I think this fixes the folders on the client: |
…1161) Co-authored-by: openhands <[email protected]>
|
@OpenHands Do a /codereview-roasted on this PR, WDYT? Think deeply and investigate, don't just guess |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
Summary of my /codereview-roasted on PR #1159 Overall assessment: Acceptable direction with a clean, unified callback pattern, but it has a few correctness bugs that currently prevent the feature from actually working. I recommend fixing these before merge. Critical issues (must fix)
Improvement opportunities (nice to have)
Testing notes
Verdict |
|
@OpenHands pls address comments in #1159 (comment) |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Address code review feedback on PR #1159: 1. Expose public LLM.telemetry property - Add telemetry property to LLM class for public access - Remove getattr() calls in EventService, use llm.telemetry directly - This fixes the issue where callbacks were never registered 2. Fix model name parsing - Pass model_name directly from LLM to event callback - Removes brittle filename parsing that breaks hyphenated model names - Model names like 'claude-3-5-sonnet-latest' now work correctly 3. Fix logging short-circuit when log_dir is None - Change guard in log_llm_call to check log_dir OR _log_callback - Allows callback-only streaming without file logging - Essential for remote execution scenarios 4. Move inline import to top-level - Move 'import os' from callback to module top-level imports - Follows code style guidelines All pre-commit checks and tests pass. Co-authored-by: openhands <[email protected]>
SummaryI've successfully addressed all critical issues from the code review comment on PR #1159. The changes have been committed and pushed to the Issues Fixed ✅1. Telemetry Access Broken (Critical)
2. Model Name Parsing Incorrect (Critical)
3. Logging Short-Circuit When log_dir is None (Critical)
4. Inline Import (Code Quality)
Verification ✅
The PR is now ready for the feature to work as intended. The streaming of LLM completion logs and stats updates should now function correctly in remote execution scenarios. |
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
@OpenHands please fix the failing actions on PR #1159 at branch |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
- Resolved merge conflict in openhands/sdk/llm/utils/telemetry.py - Combined callback-based streaming functionality with encoding improvements from main - Kept callback mechanism for remote log streaming - Applied encoding='utf-8' and ensure_ascii=False from main Co-authored-by: openhands <[email protected]>
Summary: Successfully Fixed PR #1159I've successfully completed all the requested tasks for PR #1159 on branch ✅ Tasks Completed
📊 Current PR StatusBranch: All GitHub Actions Passing (20/20): ✅
PR Status:
🔧 Conflict Resolution DetailsThe merge conflict in
I successfully combined both improvements by applying the encoding parameters (
This ensures proper UTF-8 handling in both local and remote scenarios while maintaining the streaming functionality that addresses issues #1158 and #1087. PR Link: #1159 The PR is now ready for review! 🎉 |
| # Build map of log directories for all LLMs in the agent | ||
| self._log_completion_folders = {} | ||
| for llm in agent.get_all_llms(): | ||
| if llm.log_completions: | ||
| # Map usage_id to log folder | ||
| self._log_completion_folders[llm.usage_id] = llm.log_completions_folder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we necessarily need this since we can always get this when needed?
|
@OpenHands /codereview-roasted I'm not super happy with the design of this PR, feels very hacky. Can you review the requirement once again, review the code carefully, and then help me refactor all these into a clean, modular code. I want to make as little edit as possible (compared to the main branch) to get this done. |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
This refactoring improves code clarity while maintaining the same functionality:
1. EventService refactoring:
- Extract _emit_event_from_thread() helper to eliminate duplication
- Simplify _setup_llm_log_streaming() by flattening nested closures
- Inline stats event creation in _setup_stats_streaming()
- Restore _publish_state_update() to its original simple form
2. RemoteConversation simplification:
- Simplify _create_llm_completion_log_callback() by removing unnecessary
fallback logic and redundant type checks
- Remove verbose comments, keep code self-documenting
Key improvements:
- Reduced nesting: Eliminated nested closure factory pattern
- Better separation: Stats updates handled directly in callback, not via
dual-mode _publish_state_update() method
- Less code: Removed ~30 lines while preserving functionality
- Better readability: Clearer flow with helper method
All tests pass with no functional changes.
Co-authored-by: openhands <[email protected]>
Summary: Refactored PR #1159 to Clean, Modular Code ✅I've successfully reviewed and refactored the PR to address your concerns about the "hacky" design. The changes have been committed and pushed to the What Was DoneAnalyzed the original implementation and identified the key issues making it feel hacky:
Applied targeted refactoring following software engineering principles: Changes Made1. EventService (
|
| from callbacks that may run in different threads. | ||
| """ | ||
| if self._main_loop and self._main_loop.is_running(): | ||
| asyncio.run_coroutine_threadsafe(self._pub_sub(event), self._main_loop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, dispatching the event via self._pub_sub will not persist the event. In LocalConversation, there is a default callback that handles event persistence. For more details, please refer to the following link:
software-agent-sdk/openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py
Lines 118 to 119 in c196a24
| def _default_callback(e): | |
| self._state.events.append(e) |
On the UI side, all events are loaded through WebSocket. If these events are not persisted, they will be lost, and we will be unable to display the corresponding statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @hieptl, i should mark this PR as draft since i haven't got time to iterate on this more 😭 thanks for the feedback!
@OpenHands please look at @hieptl 's feedback and address them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
I successfully addressed @hieptl's feedback on PR #1159. The issue was that LLMCompletionLogEvent and stats update events were being dispatched via self._pub_sub directly, which sent them to WebSocket subscribers but did NOT persist them to the event log. This meant the UI couldn't load these events later.
The Fix
Changed _emit_event_from_thread() in event_service.py to emit events through self._conversation._on_event instead of self._pub_sub:
Before:
asyncio.run_coroutine_threadsafe(self._pub_sub(event), self._main_loop)After:
self._main_loop.run_in_executor(None, self._conversation._on_event, event)Why This Works
In LocalConversation, there's a default callback that persists all events by appending them to self._state.events. The _on_event callback is a composed function that includes:
- The
AsyncCallbackWrapperwrappingself._pub_sub(sends to WebSocket subscribers) - The default persistence callback (saves to event log)
By routing events through _on_event, we ensure they are both streamed to clients AND persisted for later retrieval.
Verification
✅ Pre-commit hooks pass (formatting, linting, type checking)
✅ All telemetry tests pass (39/39)
✅ Event serialization tests pass
✅ Changes committed and pushed to openhands/stream-llm-completion-logs
The fix is minimal, focused, and directly addresses the persistence issue raised in the feedback.
The LLMCompletionLogEvent and stats update events were being dispatched via self._pub_sub directly, which sent them to WebSocket subscribers but did NOT persist them to the event log. This meant they would not be available when the UI loads events through WebSocket. In LocalConversation, there's a default callback that persists all events by appending them to self._state.events. By emitting events through self._conversation._on_event instead of self._pub_sub, we ensure that: 1. Events go through all callbacks including the default persistence callback 2. Events are sent to WebSocket subscribers via the AsyncCallbackWrapper 3. Events are persisted to self._state.events for later retrieval Co-authored-by: openhands <[email protected]>
| return | ||
|
|
||
| # Get the log directory for this LLM's usage_id | ||
| log_dir = self._log_completion_folders.get(event.usage_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands i don't think we need _log_completion_folders?
Instead, all the remote conversations should do is "send back messages to the client for every LLM message/callback." The server should not be aware of the "log completion folder".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
I've successfully addressed the comment from @xingyaoww by removing the _log_completion_folders dictionary from RemoteConversation. This change improves the separation of concerns between client and server.
Changes Made
File Modified: openhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py
What was changed:
- ✅ Removed
_log_completion_foldersfield - The client no longer pre-caches log folder paths in a separate dictionary - ✅ Removed initialization code - No longer iterates through LLMs to build the folder map during
__init__ - ✅ Updated callback registration logic - Now directly checks if any LLM has
log_completionsenabled - ✅ Refactored
_create_llm_completion_log_callback()- The callback now:- Looks up the LLM by
usage_iddirectly fromself.agent.get_all_llms() - Accesses
log_completions_folderfrom the LLM at write time - Uses the source of truth (agent's LLM config) instead of cached state
- Looks up the LLM by
Why This Is Better
As @xingyaoww correctly pointed out:
- Server is unaware of log folders - The server only needs to stream back
LLMCompletionLogEventwith the raw log data - Client handles its own filesystem concerns - The
log_completions_folderis purely a client-side configuration - No duplicate state - We don't maintain a separate mapping when the information is already in the agent's LLM config
- Cleaner architecture - The callback accesses configuration directly when needed rather than relying on pre-extracted state
Testing
✅ All 241 conversation tests pass
✅ Pre-commit hooks pass (formatting, linting, type checking)
✅ Changes pushed to branch openhands/stream-llm-completion-logs
The changes are minimal and focused on the specific issue raised in the comment. The PR remains functional with the same behavior, just with cleaner separation of concerns.
The server should not be aware of log_completions_folder - it only needs to stream LLM completion log events back to the client. The client can then decide where to write these logs based on its own LLM configuration. This change: - Removes the _log_completion_folders dict that pre-cached folder paths - Updates the callback to look up the LLM directly from agent config - Accesses log_completions_folder from the LLM at write time - Makes the log folder path purely a client-side concern Co-authored-by: openhands <[email protected]>
Summary
This PR implements a unified streaming solution that addresses both issue #1158 (LLM completion logs) and issue #1087 (stats updates) by using a callback-based event streaming pattern.
Fixes #1158
Fixes #1087
Problems
Issue #1158: LLM Completion Logs Trapped in Docker Container
When setting
log_completions=Truefor an LLM instance in a RemoteConversation (e.g., when usingDockerWorkspace), the completion logs were written inside the Docker container and were not accessible from the client side. This made it difficult to debug and analyze LLM behavior in remote execution scenarios.Issue #1087: Stats Updates Not Visible During Execution
When running a RemoteConversation, stats updates (cost, token usage) show $0.00 until
conversation.run()completes. The root cause is that stats are updated by mutating the sharedMetricsobject in-place, which doesn't triggerConversationState.__setattr__and therefore doesn't emit state update events.Solution
This PR implements a unified callback-based streaming approach that sends both LLM completion logs and stats updates from the agent server to the client in real-time via WebSocket events:
For LLM Completion Logs (#1158)
LLMCompletionLogEvent: A new event type that carries log data (filename, log content, model name)Telemetry: The Telemetry class now supports an optional callback that can be used instead of writing to filesEventServicein the agent server configures all LLMs with a callback that emitsLLMCompletionLogEventthrough the WebSocket connectionRemoteConversationregisters a callback that receivesLLMCompletionLogEventand writes the logs to the client filesystem at the configuredlog_completions_folderFor Stats Updates (#1087)
Telemetry: The Telemetry class now triggers a callback after updating metrics inon_response()EventServiceconfigures all LLMs with a callback that emitsConversationStateUpdateEventwith the updated stats fieldRemoteStatealready handles incremental field updates viaConversationStateUpdateEvent, so stats updates are automatically reflected in the client stateChanges
New Files
openhands-sdk/openhands/sdk/event/llm_completion_log.py: New event type for streaming logstests/sdk/event/test_llm_completion_log_event.py: Tests for event serializationModified Files
openhands-sdk/openhands/sdk/llm/utils/telemetry.py:set_log_callback()method for LLM log streamingset_stats_update_callback()method for stats streamingon_response()after metrics updateopenhands-agent-server/openhands/agent_server/event_service.py:_setup_llm_log_streaming()_setup_stats_streaming()_publish_state_update()to support selective field updatesopenhands-sdk/openhands/sdk/conversation/impl/remote_conversation.py: Handle incoming log events on clienttests/sdk/llm/test_llm_telemetry.py: Added comprehensive tests for callback functionalityTesting
LLMCompletionLogEventserialization and deserializationset_log_callback()andset_stats_update_callback()methodsBenefits
Example Usage
LLM Completion Logs
Stats Updates
Architecture
Backward Compatibility
log_completions=Trueis setDesign Principles
This PR follows the principle of eliminating special cases by using a unified callback pattern:
Future Enhancements
Potential follow-up improvements (not in this PR):
Note: This implementation provides a clean, systematic solution that addresses both issues with minimal code by following the same architectural pattern. It integrates naturally with the existing event-driven architecture and is easily extensible for future needs.
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:5037a24-pythonRun
All tags pushed for this build
About Multi-Architecture Support
5037a24-python) is a multi-arch manifest supporting both amd64 and arm645037a24-python-amd64) are also available if needed