Skip to content

Step1 parser: include stdout/stderr/cmd from failed tasks #15

@Shreyanand

Description

@Shreyanand

Problem

The step1 job parser (scripts/job_parser.py) only captures res.msg from failed task events, which is often just a generic message like "non-zero return code". The actual diagnostic details — stdout, stderr, cmd, and rc — are available in res but are not extracted.

This forces Claude to go back to the raw job log to find the real error, adding latency and extra steps to the analysis.

Example: Job 2035762 failed with "non-zero return code" in step1, but the actual root cause (error: pathspec 'cert-manager-fallback' did not match any file(s) known to git) was only available by manually parsing the raw event data.

Proposed Change

In _extract_failed_tasks(), after building the base task info dict:

  1. Extract res.stdout, res.stderr, res.cmd, and res.rc when present
  2. Fallback: if res fields are empty/suppressed, capture event.stdout (the rendered Ansible output which often contains the full error even when res fields are suppressed)

No impact on step4 — it only reads task_path, task, play, role, task_action, error_message, duration, and timestamp. The new fields are purely additive and only benefit Step 5 (Claude's analysis).

File

skills/root-cause-analysis/scripts/job_parser.py_extract_failed_tasks() (line ~159)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions