Skip to content

fix: async jobs stuck in processing on marshal failure#1907

Merged
akshaydeo merged 1 commit intomainfrom
03-04-fix_async_jobs_stuck_in_processing_on_marshal_failure_now_correctly_transition_to_failed_
Mar 5, 2026
Merged

fix: async jobs stuck in processing on marshal failure#1907
akshaydeo merged 1 commit intomainfrom
03-04-fix_async_jobs_stuck_in_processing_on_marshal_failure_now_correctly_transition_to_failed_

Conversation

@Pratham-Mishra04
Copy link
Collaborator

Summary

Fixes async jobs that become stuck in "processing" state when JSON marshaling fails during job execution. Previously, marshal failures would cause the job to exit without updating its status, leaving it permanently in the processing state.

Changes

  • Added panic recovery mechanism to ensure async jobs always reach a terminal state even if unexpected panics occur
  • Introduced markFailed helper function to consistently update job status to "failed" with appropriate error details
  • Added explicit error handling for JSON marshal failures that now properly transition jobs to "failed" status instead of leaving them stuck
  • Jobs that fail during error serialization or result serialization now get marked as failed with descriptive error messages

Type of change

  • Bug fix
  • Feature
  • Refactor
  • Documentation
  • Chore/CI

Affected areas

  • Core (Go)
  • Transports (HTTP)
  • Providers/Integrations
  • Plugins
  • UI (Next.js)
  • Docs

How to test

Test async job execution with marshal failures to verify jobs transition to failed state:

# Core/Transports
go version
go test ./...

# Test specific async job scenarios
go test ./framework/logstore -v -run TestAsyncJob

Screenshots/Recordings

N/A

Breaking changes

  • Yes
  • No

Related issues

N/A

Security considerations

No security implications. This change improves system reliability by preventing jobs from being stuck in processing state.

Checklist

  • I read docs/contributing/README.md and followed the guidelines
  • I added/updated tests where appropriate
  • I updated documentation where needed
  • I verified builds succeed (Go and UI)
  • I verified the CI pipeline passes locally if applicable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eb0e96e4-eb32-4e1d-a9d8-1a96be31240b

📥 Commits

Reviewing files that changed from the base of the PR and between 5f81d2f and 50e1b1e.

📒 Files selected for processing (2)
  • framework/logstore/asyncjob.go
  • transports/changelog.md

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes
    • Resolved an issue where async jobs could remain stuck in processing state due to internal errors.
    • Enhanced failure handling for async job execution with improved error recovery and state management.

Walkthrough

Adds defensive panic recovery to the async job executor with a new markFailed helper function that ensures consistent failure state transitions. When panics or marshaling errors occur during job execution, the job is marked as failed with proper status codes and timestamps updated. Updates changelog to document the async job failure-state fix.

Changes

Cohort / File(s) Summary
Async Job Error Handling
framework/logstore/asyncjob.go
Introduces panic recovery defer and new markFailed helper to ensure async jobs transition to failed state with proper error payloads, status codes, and TTL timestamps when panics or marshaling errors occur.
Documentation
transports/changelog.md
Added changelog entry documenting fix for async jobs incorrectly stuck in "processing" state on marshal failure, now correctly transitioning to "failed".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 A panic? Fear not, dear job so dear!
With recovery guards, we'll save you here.
When chaos strikes, our helper stands tall,
markFailed ensures you won't stall.
From stuck states free, to failed with grace! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main fix: async jobs transitioning from stuck in 'processing' to proper failure handling on marshal failure.
Description check ✅ Passed The description covers all required template sections including summary, changes, type of change, affected areas, testing instructions, and checklists.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 03-04-fix_async_jobs_stuck_in_processing_on_marshal_failure_now_correctly_transition_to_failed_

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator Author

Pratham-Mishra04 commented Mar 4, 2026

@Pratham-Mishra04 Pratham-Mishra04 linked an issue Mar 4, 2026 that may be closed by this pull request
2 tasks
Base automatically changed from 03-04-fix_preserve_original_audio_filename_in_transcription_requests to main March 4, 2026 14:16
@Pratham-Mishra04 Pratham-Mishra04 force-pushed the 03-04-fix_async_jobs_stuck_in_processing_on_marshal_failure_now_correctly_transition_to_failed_ branch from e46ab9e to 50e1b1e Compare March 5, 2026 06:58
@Pratham-Mishra04 Pratham-Mishra04 marked this pull request as ready for review March 5, 2026 06:58
Copy link
Contributor

akshaydeo commented Mar 5, 2026

Merge activity

  • Mar 5, 8:35 AM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Mar 5, 8:35 AM UTC: @akshaydeo merged this pull request with Graphite.

@akshaydeo akshaydeo merged commit bbf4084 into main Mar 5, 2026
8 checks passed
@akshaydeo akshaydeo deleted the 03-04-fix_async_jobs_stuck_in_processing_on_marshal_failure_now_correctly_transition_to_failed_ branch March 5, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Job ID does not reference fallback execution

2 participants