Add QC orchestrator sub-agent for pre-merge quality validation in multi-worktree workflows #57

Copilot · 2025-11-19T04:36:17Z

Fixed file corruption and addressed remaining PR review feedback:

Changes

1. Fixed File Corruption

The orchestrator.rs file had become corrupted with duplicate content through previous update commits
Restored clean version with all previous improvements intact

2. Named Constants for Risk Scoring (Comment 2541464636)

Added named constants with documentation:
- MAX_LINES_FOR_RISK = 500.0: Lines threshold for normalization
- MAX_FILES_FOR_RISK = 20.0: Files threshold for normalization
- RISK_WEIGHT_LINES = 0.7: Weight for line changes (70%)
- RISK_WEIGHT_FILES = 0.3: Weight for file changes (30%)
Improved calculate_risk_score method documentation explaining the algorithm
Makes the risk scoring system more maintainable and transparent

3. Security Documentation (Comment 2541464569)

Added comprehensive security note to run_command method
Documents that commands are currently hard-coded in TestProfile
Warns that if user-configurable profiles are added in the future, proper input validation and sanitization must be implemented to prevent command injection
Addresses security concerns proactively

4. Restored profiles.rs

Fixed profiles.rs which had reverted to old version with || echo fallbacks
Restored proper version with:
- Package.json existence check for web tests
- optional_rust_commands() method for tarpaulin
- Proper pnpm/npm detection without always-succeeding fallbacks

Testing

All 8 QC module tests passing
Build successful with no warnings
Risk scoring constants properly used throughout

Original prompt

Implement a production-ready QC orchestrator sub-agent in the zapabob/codex repository that performs pre-merge quality checks for AI-assisted multi-worktree development.

Context and goals

Repository: zapabob/codex (fork of openai/codex, Rust backend + Tauri GUI + web packages)
Development style: AI-orchestrated sub-agents and Git worktree-based parallel development
Target: Rust backend (CLI) and Tauri GUI front-end, shared as a common execution kernel that can be controlled by multiple LLM UIs (Codex CLI, Gemini CLI, Claude Code, etc.) in a many-to-one manner.
QC agent line limit policy: If the total changed lines (additions + deletions) exceed 200 lines, the QC agent must recommend opening a PR for human review.
Time base: Use the local system time (synchronized with the user’s PC), not UTC, but include the offset in the textual representation.
Documentation/log root: use the existing _docs directory.

High-level requirements

QC orchestrator agent
- Add a dedicated QC orchestrator component (e.g. qc_orchestrator.rs in the Rust backend) that can be invoked from:
  - The CLI (codex terminal UI)
  - The Tauri GUI
- Responsibilities:
  - Detect current Git worktree (via git worktree list or equivalent) for the current working directory.
  - Determine and execute an appropriate test profile (see below).
  - Collect test results, lint results, and basic metrics (changed lines, affected files, new warnings).
  - Compute a simple risk score and generate a merge recommendation.
  - Enforce the 200-line rule: if the diff exceeds 200 lines, always recommend creating a PR and clearly log that recommendation.
  - Emit a human-readable, structured log entry into _docs/logs (see below for format).
Test profiles ("best-practice" defaults)
- Define test profiles that can be configured and selected, with a default profile used when none is specified.
- At minimum define profiles like:
  - minimal:
    - Rust: cargo test -p codex-cli
    - Web/GUI: no tests
  - standard (default):
    - Rust: cargo test --all
    - Rust lint: cargo clippy --all --all-targets -- -D warnings
    - Web/GUI: pnpm test (or npm test if pnpm is unavailable) in appropriate packages
  - full:
    - Everything in standard
    - Optional coverage tool (e.g. cargo tarpaulin --workspace) if available
    - Web lint: pnpm lint (or npm run lint)
- Put the human-facing description of these profiles into _docs/test-profiles.md.
- Wire the Rust QC orchestrator so that it reads the desired default profile from configuration (e.g. config.toml under a [qc] section, with fallback to standard).

Logging format and storage

All QC runs must append a log entry to a markdown file under _docs/logs, created as needed.
File naming convention:
- _docs/logs/YYYY-MM-DD-{worktreename}-impl.md
- The date should use the local date (based on system time).

Each QC run appends a section with this structure:

## 2025-11-19 13:40:12 +0900

- Worktree: feature-tauri-gui
- 機能: <short, human-readable description of the change or goal>
- 実装担当エージェント: <logical agent name, e.g. tauri-gui-agent>
- AI名: <model/front-end identifier, e.g. claude-code>
- テストプロファイル: <minimal|standard|full>

### テスト結果
- Rust: <command> → <PASS/FAIL + summary>
- Rust Lint: <command> → <PASS/FAIL + summary>
- Web: <command> → <PASS/FAIL + summary>

### 品質評価 (QC / 統計)
- 変更行数: <integer>
- 影響ファイル数: <integer>
- 新規警告: <integer> (from clippy/eslint/etc.)
- リスクスコア: <0.0–1.0, where higher = more risk>

### マージ可否の判断
- 推奨: <マージしてよい|マージしない (要修正)|PRを作成してレビュー>
- 理由:
  - e.g. tests/lints status, 200-line rule, risk score, notable failures

### 指摘事項 (要修正)
- Bullet list of concrete issues from failing tests or lints

---

The QC orchestrator should:
- Always include the current local datetime with timezone offset.
- Always evaluate the 200-line rule and reflect it in the "マージ可否の判断" section.

Git diff and 200-line rule
- QC orchestrator should compute changes using git diff between the current worktree state and its upstream (e.g. the main branch or configured base ref).
- Compute:
  - total changed lines (additions + deletions)
  - number of changed files
- If total changed lines > 200:
  - Set recommendation to include "PRを作成してレビュー".
  - Log clearly that this is due to the 200-line policy.
Integration into CLI and Tauri GUI
- Expose a CLI subcommand, e.g. codex qc or similar:
  - Accept flags: --profile, --agent-name, --ai-name, --feature (機能 description).
  - Run the QC orchestrator, print a concise summary to stdout (including the recommendation), and write the detailed log into _docs/logs.
- Integrate into Tauri GUI:
  - Add a "QC テスト実行" button or menu ...

This pull request was created as a result of the following prompt from Copilot chat.

Implement a production-ready QC orchestrator sub-agent in the zapabob/codex repository that performs pre-merge quality checks for AI-assisted multi-worktree development.

Context and goals

Repository: zapabob/codex (fork of openai/codex, Rust backend + Tauri GUI + web packages)

Development style: AI-orchestrated sub-agents and Git worktree-based parallel development

Target: Rust backend (CLI) and Tauri GUI front-end, shared as a common execution kernel that can be controlled by multiple LLM UIs (Codex CLI, Gemini CLI, Claude Code, etc.) in a many-to-one manner.

QC agent line limit policy: If the total changed lines (additions + deletions) exceed 200 lines, the QC agent must recommend opening a PR for human review.

Time base: Use the local system time (synchronized with the user’s PC), not UTC, but include the offset in the textual representation.

Documentation/log root: use the existing _docs directory.

High-level requirements
QC orchestrator agent

Add a dedicated QC orchestrator component (e.g. qc_orchestrator.rs in the Rust backend) that can be invoked from:

The CLI (codex terminal UI)

The Tauri GUI

Responsibilities:

Detect current Git worktree (via git worktree list or equivalent) for the current working directory.

Determine and execute an appropriate test profile (see below).

Collect test results, lint results, and basic metrics (changed lines, affected files, new warnings).

Compute a simple risk score and generate a merge recommendation.

Enforce the 200-line rule: if the diff exceeds 200 lines, always recommend creating a PR and clearly log that recommendation.

Emit a human-readable, structured log entry into _docs/logs (see below for format).

Test profiles ("best-practice" defaults)

Define test profiles that can be configured and selected, with a default profile used when none is specified.

At minimum define profiles like:

minimal:

Rust: cargo test -p codex-cli

Web/GUI: no tests

standard (default):

Rust: cargo test --all

Rust lint: cargo clippy --all --all-targets -- -D warnings

Web/GUI: pnpm test (or npm test if pnpm is unavailable) in appropriate packages

full:

Everything in standard

Optional coverage tool (e.g. cargo tarpaulin --workspace) if available

Web lint: pnpm lint (or npm run lint)

Put the human-facing description of these profiles into _docs/test-profiles.md.

Wire the Rust QC orchestrator so that it reads the desired default profile from configuration (e.g. config.toml under a [qc] section, with fallback to standard).
Logging format and storage
All QC runs must append a log entry to a markdown file under _docs/logs, created as needed.

File naming convention:

_docs/logs/YYYY-MM-DD-{worktreename}-impl.md

The date should use the local date (based on system time).
Each QC run appends a section with this structure:
## 2025-11-19 13:40:12 +0900

- Worktree: feature-tauri-gui
- 機能: <short, human-readable description of the change or goal>
- 実装担当エージェント: <logical agent name, e.g. tauri-gui-agent>
- AI名: <model/front-end identifier, e.g. claude-code>
- テストプロファイル: <minimal|standard|full>

### テスト結果
- Rust: <command> → <PASS/FAIL + summary>
- Rust Lint: <command> → <PASS/FAIL + summary>
- Web: <command> → <PASS/FAIL + summary>

### 品質評価 (QC / 統計)
- 変更行数: <integer>
- 影響ファイル数: <integer>
- 新規警告: <integer> (from clippy/eslint/etc.)
- リスクスコア: <0.0–1.0, where higher = more risk>

### マージ可否の判断
- 推奨: <マージしてよい|マージしない (要修正)|PRを作成してレビュー>
- 理由:
  - e.g. tests/lints status, 200-line rule, risk score, notable failures

### 指摘事項 (要修正)
- Bullet list of concrete issues from failing tests or lints

---
The QC orchestrator should:

Always include the current local datetime with timezone offset.

Always evaluate the 200-line rule and reflect it in the "マージ可否の判断" section.
Git diff and 200-line rule

QC orchestrator should compute changes using git diff between the current worktree state and its upstream (e.g. the main branch or configured base ref).

Compute:

total changed lines (additions + deletions)

number of changed files

If total changed lines > 200:

Set recommendation to include "PRを作成してレビュー".

Log clearly that this is due to the 200-line policy.

Integration into CLI and Tauri GUI

Expose a CLI subcommand, e.g. codex qc or similar:

Accept flags: --profile, --agent-name, --ai-name, --feature (機能 description).

Run the QC orchestrator, print a concise summary to stdout (including the recommendation), and write the detailed log into _docs/logs.

Integrate into Tauri GUI:

Add a "QC テスト実行" button or menu item that triggers the same QC orchestrator with appropriate parameters (agent name, AI name, feature description can be entered or prefilled).

Surface PASS/FAIL and PR recommendation visually.
Configuration
Extend existing configuration (e.g. config.toml) with a [qc] section, for example:
[qc]
default_profile = "standard"
max_lines_without_pr = 200
base_branch = "main" # optional, fallback to repo default
The QC orchestrator should honor these values with safe defaults when the config is missing.
Agent pre-implementation behavior

For each AI agent front-end (Codex CLI, Gemini CLI, Claude Code, etc.), ensure there is a mechanism so that before starting a new implementation task, the agent can:

Locate the most recent _docs/logs/*-{worktreename}-impl.md file for the current worktree.

Read and internally summarize the last few entries.

Use this summary as context for its next implementation plan (e.g. avoiding known failing tests or open issues).

This may involve adding helper functions or utilities in the Rust side or higher-level orchestrator code (manager.rs, plan_orchestrator.rs) to:

find and read the relevant log file

extract the last N sections

Documentation

Add or update documentation under _docs and/or AGENTS.md / ARCHITECTURE.md to briefly describe:

The QC orchestrator’s role.

The test profiles and their trade-offs.

The 200-line PR recommendation policy.

How CLI and Tauri users invoke QC checks.
Implementation notes

Follow the existing coding style and module organization in codex-rs, codex-cli, and gui.

Handle errors gracefully: when a particular test command is missing (e.g. pnpm not installed), log and report it instead of panicking, and downgrade that part of the evaluation.

Do not introduce breaking changes to existing public CLI behavior; add new subcommands/options instead.

Keep changes per file and per module to human-reviewable size, even though the QC orchestrator itself will later enforce this policy.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: zapabob <[email protected]>

Copilot

Pull Request Overview

This PR implements a QC (Quality Check) orchestrator sub-agent for automated pre-merge quality validation in AI-assisted multi-worktree development workflows. The implementation adds comprehensive testing infrastructure with configurable profiles, risk scoring, and structured logging with a 200-line change threshold for PR recommendations.

Key Changes

Implemented complete QC orchestrator module with test execution, diff analysis, risk scoring (0.0-1.0), and merge recommendation logic
Added three test profiles (minimal, standard, full) with different levels of test coverage
Integrated QC functionality into the CLI with codex qc subcommand supporting --profile option

Reviewed Changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 17 comments.

Show a summary per file

File	Description
config.toml	Added `[qc]` configuration section with default_profile setting
codex-rs/tui/src/main.rs	Minor whitespace cleanup (removed BOM, trailing spaces)
codex-rs/tui/Cargo.toml	Removed conflicting workspace declaration
codex-rs/core/src/qc/worktree.rs	New module for Git worktree detection and branch name extraction
codex-rs/core/src/qc/profiles.rs	New module defining test profiles with Rust and Web test commands
codex-rs/core/src/qc/orchestrator.rs	Core orchestration logic for test execution, diff analysis, and recommendations
codex-rs/core/src/qc/logger.rs	Markdown log generation with timezone-aware timestamps
codex-rs/core/src/qc/mod.rs	Module exports for QC functionality
codex-rs/core/src/lib.rs	Added qc module to core library
codex-rs/core/Cargo.toml	Added chrono dependency with serde feature
codex-rs/cli/src/main.rs	Integrated qc subcommand with profile selection and summary output
codex-rs/cli/Cargo.toml	Added codex-core dependency and removed workspace declaration
codex-rs/Cargo.lock	Lock file updates for new dependencies (chrono and transitive deps)
_docs/test-profiles.md	Documentation for the three test profiles and their use cases
_docs/qc-usage-guide.md	Comprehensive usage guide with examples, workflow integration, and troubleshooting
_docs/logs/2025-11-19-copilot-add-qc-orchestrator-sub-agent-impl.md	Example log file from QC test runs
_docs/QC_ORCHESTRATOR_IMPLEMENTATION_SUMMARY.md	Technical implementation summary and validation checklist

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

codex-rs/core/src/qc/orchestrator.rs

codex-rs/core/Cargo.toml

_docs/qc-usage-guide.md

codex-rs/core/src/qc/orchestrator.rs

codex-rs/cli/src/main.rs

codex-rs/core/src/qc/profiles.rs

codex-rs/core/src/qc/orchestrator.rs

github-actions · 2025-11-21T04:51:08Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

0 out of 2 committers have signed the CLA.
❌ @Copilot
❌ @zapabob
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

Co-authored-by: Copilot <[email protected]>

… add named constants for risk scoring and security documentation for command execution Co-authored-by: zapabob <[email protected]>

Initial plan

f70d4b3

Copilot AI assigned Copilot and zapabob Nov 19, 2025

Copilot started work on behalf of zapabob November 19, 2025 04:36 View session

Copilot AI and others added 5 commits November 19, 2025 04:45

Add QC orchestrator with test profiles and logging

4abd695

Co-authored-by: zapabob <[email protected]>

Fix QC orchestrator path handling and log file naming

26da937

Co-authored-by: zapabob <[email protected]>

Add comprehensive QC usage documentation and examples

f5e1168

Co-authored-by: zapabob <[email protected]>

Fix clippy warnings in QC orchestrator module

1805659

Co-authored-by: zapabob <[email protected]>

Add QC orchestrator implementation summary

a8d8d59

Co-authored-by: zapabob <[email protected]>

Copilot AI changed the title ~~[WIP] Implement QC orchestrator sub-agent for pre-merge checks~~ Add QC orchestrator sub-agent for pre-merge quality validation in multi-worktree workflows Nov 19, 2025

Copilot AI requested a review from zapabob November 19, 2025 05:00

Copilot finished work on behalf of zapabob November 19, 2025 05:00

zapabob approved these changes Nov 19, 2025

View reviewed changes

zapabob marked this pull request as ready for review November 19, 2025 10:34

Copilot AI review requested due to automatic review settings November 19, 2025 10:34

Copilot started reviewing on behalf of zapabob November 19, 2025 10:34 View session

Copilot finished reviewing on behalf of zapabob November 19, 2025 10:38

Copilot AI reviewed Nov 19, 2025

View reviewed changes

Merge branch 'main' into copilot/add-qc-orchestrator-sub-agent

9f079e5

Copilot started work on behalf of zapabob November 21, 2025 04:53 View session

zapabob and others added 9 commits November 21, 2025 13:55

Update codex-rs/core/src/qc/orchestrator.rs

f71a361

Co-authored-by: Copilot <[email protected]>

Update _docs/qc-usage-guide.md

43afd48

Co-authored-by: Copilot <[email protected]>

Update codex-rs/core/src/qc/orchestrator.rs

2b0915c

Co-authored-by: Copilot <[email protected]>

Update codex-rs/core/src/qc/orchestrator.rs

b0c725d

Co-authored-by: Copilot <[email protected]>

Update codex-rs/core/Cargo.toml

6d2079d

Co-authored-by: Copilot <[email protected]>

Update _docs/qc-usage-guide.md

1295635

Co-authored-by: Copilot <[email protected]>

Update codex-rs/core/src/qc/orchestrator.rs

2572452

Co-authored-by: Copilot <[email protected]>

Update _docs/qc-usage-guide.md

3582ef8

Co-authored-by: Copilot <[email protected]>

Update codex-rs/core/src/qc/orchestrator.rs

81db04f

Co-authored-by: Copilot <[email protected]>

Update codex-rs/core/src/qc/orchestrator.rs

05370f3

Co-authored-by: Copilot <[email protected]>

Copilot AI requested a review from zapabob November 21, 2025 05:12

Copilot finished work on behalf of zapabob November 21, 2025 05:12

Copilot started work on behalf of zapabob November 21, 2025 05:12 View session

Fix corrupted orchestrator.rs file and address remaining PR feedback:…

558afec

… add named constants for risk scoring and security documentation for command execution Co-authored-by: zapabob <[email protected]>

Copilot finished work on behalf of zapabob November 21, 2025 05:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add QC orchestrator sub-agent for pre-merge quality validation in multi-worktree workflows #57

Add QC orchestrator sub-agent for pre-merge quality validation in multi-worktree workflows #57

Copilot AI commented Nov 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add QC orchestrator sub-agent for pre-merge quality validation in multi-worktree workflows #57

Are you sure you want to change the base?

Add QC orchestrator sub-agent for pre-merge quality validation in multi-worktree workflows #57

Conversation

Copilot AI commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

1. Fixed File Corruption

2. Named Constants for Risk Scoring (Comment 2541464636)

3. Security Documentation (Comment 2541464569)

4. Restored profiles.rs

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 19, 2025 •

edited

Loading