Skip to content

Conversation

roomote[bot]
Copy link

@roomote roomote bot commented Jul 18, 2025

This PR implements a comprehensive solution for issue #5892 - the gray state problem where Roo gets stuck when an Orchestrator task creates a subtask that completes successfully, but the provider gets disconnected during the process.

Problem

When an Orchestrator creates a subtask using new_task and the subtask completes successfully, but a provider disconnection occurs during finishSubTask() or resumePausedTask(), the parent task becomes stuck in a "gray state" where:

  • isStreaming = false (task appears inactive)
  • enableButtons = false (user cannot interact)
  • Task engine is not running but buttons are disabled

This leaves users unable to cancel, finish, or interact with the task.

Solution

Implemented multi-layer error recovery mechanisms:

1. Task-Level Recovery (src/core/task/Task.ts)

  • Enhanced resumePausedTask() with graceful error handling instead of throwing exceptions
  • Added recovery mechanisms in recursivelyMakeClineRequests() for API failures
  • Reset task state and enable user interaction when recovery is needed
  • Add recovery messages to inform users of the situation

2. Provider-Level Recovery (src/core/webview/ClineProvider.ts)

  • Added recoverFromGrayState() method with multiple recovery strategies
  • Enhanced finishSubTask() with comprehensive error handling
  • Implemented fallback mechanisms including task clearing as last resort
  • Force task state reset and UI refresh when needed

3. UI-Level Validation (webview-ui/src/components/chat/ChatView.tsx)

  • Added gray state detection logic to identify problematic conditions
  • Implemented automatic recovery with 2-second delay to avoid false positives
  • Clear task state when gray state is detected and persists

4. Comprehensive Testing (src/tests/grayStateRecovery.test.ts)

  • Test suite covering all recovery scenarios
  • Validation of gray state detection logic
  • Verification of multiple recovery strategies

Recovery Strategies

  1. Force task resume: Reset task state flags and enable buttons
  2. Add recovery messages: Inform users about the recovery process
  3. Refresh UI state: Update webview with corrected state
  4. Clear task (last resort): Remove stuck task entirely if other methods fail

Testing

  • Added comprehensive test suite with mocked scenarios
  • Tests cover provider disconnections, API failures, and gray state detection
  • Validates that recovery mechanisms work correctly

Fixes #5892


Important

Fixes gray state issue by implementing multi-layer recovery mechanisms in Task.ts, ClineProvider.ts, and ChatView.tsx, with comprehensive testing in grayStateRecovery.test.ts.

  • Behavior:
    • Implements recovery for gray state in Task.ts and ClineProvider.ts by enhancing error handling in resumePausedTask() and finishSubTask().
    • Adds recoverFromGrayState() in ClineProvider.ts to handle task state recovery.
    • Detects gray state in ChatView.tsx and attempts automatic recovery with a 2-second delay.
  • Testing:
    • Adds grayStateRecovery.test.ts to test recovery scenarios, including provider disconnections and API failures.
    • Validates gray state detection and recovery strategies.
  • Misc:
    • Updates UI logic in ChatView.tsx to handle gray state detection and recovery.

This description was created by Ellipsis for 37cac2e. You can customize this summary. It will automatically update as commits are pushed.

…on failures

- Add error recovery in Task.resumePausedTask() to handle provider disconnections gracefully
- Enhance Task.recursivelyMakeClineRequests() with recovery mechanisms for API failures
- Add ClineProvider.recoverFromGrayState() method with multiple recovery strategies
- Enhance ClineProvider.finishSubTask() with comprehensive error handling
- Add UI state validation in ChatView.tsx to detect and recover from gray state
- Include comprehensive test suite for gray state recovery mechanisms

Fixes #5892: Gray state issue where Orchestrator subtask completion with provider disconnection leaves parent task unusable
@roomote roomote bot requested review from mrubens, cte and jr as code owners July 18, 2025 15:21
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Jul 18, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 18, 2025
@daniel-lxs daniel-lxs closed this Jul 18, 2025
@github-project-automation github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Jul 18, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 18, 2025
@daniel-lxs daniel-lxs reopened this Jul 18, 2025
@github-project-automation github-project-automation bot moved this from Done to New in Roo Code Roadmap Jul 18, 2025
@github-project-automation github-project-automation bot moved this from Done to Triage in Roo Code Roadmap Jul 18, 2025
@daniel-lxs
Copy link
Collaborator

@roomote-agent use the switch_task tool with the pr-fixer mode to fix the unit tests of this PR, do not create a new PR

Copy link
Author

roomote bot commented Jul 18, 2025

👋 I've received your request to fix the unit tests for this PR. I can see there are TypeScript compilation errors in the test file that need to be resolved. I'll check out the branch and fix these issues now.

@daniel-lxs daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Jul 18, 2025
@hannesrudolph hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Jul 18, 2025
- Remove invalid clineStack property from ClineProvider mock (private property)
- Remove invalid webview property from ClineProvider mock (non-existent property)
- Tests now compile and pass successfully (8/8 tests passing)
- Fixes compilation errors that were blocking CI checks
Copy link
Author

roomote bot commented Jul 18, 2025

Unit test compilation errors fixed!

I've successfully resolved the TypeScript compilation errors in the gray state recovery tests. Here's what was accomplished:

🔧 Issues Fixed:

  • Removed invalid property from ClineProvider mock (private property that shouldn't be accessed in tests)
  • Removed invalid property from ClineProvider mock (non-existent property)
  • Fixed all TypeScript compilation errors in

Results:

  • All 8 unit tests now pass (8/8 passing)
  • TypeScript compilation successful - the check is now ✅ passing
  • All other checks passing: CodeQL, translations, knip, platform unit tests (Ubuntu & Windows)
  • Tests run locally: Confirmed working with
    RUN v3.2.4 /roo/repos/Roo-Code

✓ src/tests/grayStateRecovery.test.ts (8 tests) 25ms

Test Files 1 passed (1)
Tests 8 passed (8)
Start at 20:48:24
Duration 400ms (transform 85ms, setup 0ms, collect 63ms, tests 25ms, environment 0ms, prepare 95ms)

📋 Changes Made:

The fix involved updating the test mocks to only use public methods and properties that actually exist on the class, ensuring the tests properly simulate the gray state recovery scenarios without accessing private implementation details.

The gray state recovery functionality itself remains intact and properly tested - this was purely a test infrastructure fix to resolve compilation issues that were blocking CI.

Copy link
Author

roomote bot commented Jul 18, 2025

Unit test compilation errors fixed!

I have successfully resolved the TypeScript compilation errors in the gray state recovery tests.

Issues Fixed:

  • Removed invalid clineStack property from ClineProvider mock (private property)
  • Removed invalid webview property from ClineProvider mock (non-existent property)
  • Fixed all TypeScript compilation errors in src/tests/grayStateRecovery.test.ts

Results:

  • All 8 unit tests now pass (8/8 passing)
  • TypeScript compilation successful - the compile check is now passing
  • All other checks passing: CodeQL, translations, knip, platform unit tests
  • Tests run locally and confirmed working

The fix involved updating the test mocks to only use public methods and properties that actually exist on the ClineProvider class. The gray state recovery functionality itself remains intact and properly tested - this was purely a test infrastructure fix to resolve compilation issues that were blocking CI.

@daniel-lxs daniel-lxs moved this from PR [Draft / In Progress] to PR [Needs Prelim Review] in Roo Code Roadmap Jul 19, 2025
@daniel-lxs
Copy link
Collaborator

I'm closing this due to a fundamental architectural issue with the approach.

This PR implements recovery mechanisms for a "gray state" that shouldn't exist. The root issue is in the state management logic - when a provider disconnection occurs during finishSubTask() or resumePausedTask(), the UI state becomes inconsistent.

Instead of adding recovery logic across multiple components, the solution should:

  1. Fix error handling in finishSubTask() and resumePausedTask() to always leave the UI in a valid state
  2. Ensure provider disconnections immediately trigger proper UI state updates
  3. Make state transitions atomic where possible
  4. Add state validation to prevent invalid states from being rendered

The current approach adds complexity and race conditions (like the 2-second timeout) without addressing the underlying issue. A new PR should focus on preventing the gray state rather than recovering from it.

@daniel-lxs daniel-lxs closed this Jul 21, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Jul 21, 2025
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working PR - Needs Preliminary Review size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Roo Cancel / Start Task button gets stuck in a "gray state"
3 participants