Skip to content

fix(intake): recover when sprint planner emits dependency outside sprint plan#908

Open
majiayu000 wants to merge 3 commits intomainfrom
feat/884-intake-dependency-recovery
Open

fix(intake): recover when sprint planner emits dependency outside sprint plan#908
majiayu000 wants to merge 3 commits intomainfrom
feat/884-intake-dependency-recovery

Conversation

@majiayu000
Copy link
Copy Markdown
Owner

Summary

  • normalize sprint-plan dependencies before DAG validation so intake no longer aborts the full repo sprint on one out-of-plan edge
  • drop missing edges when the upstream issue is already done in the task store, and skip only the affected task when the upstream is still unresolved
  • add intake unit coverage for completed upstream, unresolved upstream, and mixed valid/invalid dependency graphs

Closes #884

Validation

  • cargo fmt --all
  • cargo clippy --workspace --all-targets -- -D warnings
  • RUSTFLAGS="-Dwarnings" cargo check --workspace --all-targets
  • cargo test -p harness-server normalize_sprint_tasks_
  • cargo test --workspace (blocked by unrelated harness_observe event_store tests failing with Postgres session exhaustion / pool timeout: EMAXCONNSESSION)

…encies

Sprint-plan validation used to abort the entire repo batch as soon as the planner referenced an upstream issue outside the current plan. This change normalizes those edges first: completed upstreams are dropped, unresolved upstreams cause only the affected task to be skipped for the current sprint, and the remaining DAG is still validated and executed.

Constraint: Intake must degrade safely on malformed planner output without changing planner quality expectations
Rejected: Auto-inject missing upstream tasks | planner output does not include their own dependency shape, so injection could create unsafe ordering
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Preserve normalization-before-validation ordering; moving validate_dag earlier will reintroduce fail-stop behavior for planner defects
Tested: cargo fmt --all
Tested: cargo clippy --workspace --all-targets -- -D warnings
Tested: RUSTFLAGS="-Dwarnings" cargo check --workspace --all-targets
Tested: cargo test -p harness-server normalize_sprint_tasks_
Not-tested: cargo test --workspace remains blocked by unrelated harness_observe event_store tests hitting Postgres session limits (EMAXCONNSESSION / pool timeout)
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a normalization step for sprint plans to handle dependencies that are either completed or missing from the plan. It includes logic to skip tasks with unresolved upstreams and remove redundant edges. Feedback suggests optimizing the retrieval of completed tasks by filtering at the database level and ensuring consistency in status checks by including failed tasks as unblocking dependents.

Comment thread crates/harness-server/src/intake/mod.rs Outdated
tasks: &crate::task_runner::TaskStore,
project_id: &str,
) -> std::collections::HashSet<u64> {
match tasks.list_all_with_terminal().await {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of tasks.list_all_with_terminal() followed by in-memory filtering by project_id may lead to performance degradation as the task history grows. Since this function is called during the intake process which runs periodically, fetching all terminal tasks from the store is inefficient. It is recommended to implement a method in TaskStore that allows filtering by project_root and status at the database level to reduce memory usage and processing time.

Comment thread crates/harness-server/src/intake/mod.rs Outdated
Ok(all_tasks) => all_tasks
.into_iter()
.filter(|task| {
matches!(task.status, TaskStatus::Done)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check only considers TaskStatus::Done as a completed dependency. However, the status_unblocks_dependents helper (defined at line 790 and used during active sprint execution) also includes TaskStatus::Failed. This inconsistency means that if a dependency failed in a previous sprint, tasks depending on it will be skipped during normalization in the next sprint, even though they would have been allowed to run if the failure happened within the same sprint. Consider using the helper for consistency.

Suggested change
matches!(task.status, TaskStatus::Done)
status_unblocks_dependents(&task.status)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae2b3148e2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/harness-server/src/intake/mod.rs Outdated
Comment on lines +381 to +385
&& task
.project_root
.as_ref()
.map(|path| path.to_string_lossy() == project_id)
.unwrap_or(false)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope completed-upstream lookup to the current repo

The new normalization path builds completed_issues from all Done tasks that share a project path, but it does not also constrain by task.repo. If intake is configured with multiple GitHub repos under the same project_root, overlapping issue numbers are valid, so a completed issue:99 from repo A can be mistaken as completion for repo B and cause normalize_sprint_tasks to drop repo B dependency edges incorrectly. That can dispatch downstream work before its real upstream is done; the lookup should include repo identity (or another namespace) in addition to project.

Useful? React with 👍 / 👎.

Gemini flagged two medium issues in intake normalization: it was loading every terminal task row into memory and only treating `done` tasks as dependency-unblocking. This change adds a project/status-filtered external-id lookup in the task DB/store layer, reuses it during normalization, and covers the failed-task case with a regression test.

Constraint: Intake normalization runs periodically and must stay cheap on large task histories
Constraint: Dependency unblocking must remain consistent with existing done/failed runtime semantics
Rejected: Keep list_all_with_terminal-based filtering | loads every historical task row on each intake tick
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep the DB-side dependency-unblocking status filter aligned with intake/runtime dependency semantics
Tested: cargo fmt --all; cargo check; RUSTFLAGS="-Dwarnings" cargo check --workspace --all-targets
Not-tested: cargo test; cargo test -p harness-server; targeted intake test rerun | blocked by shared Postgres session cap (EMAXCONNSESSION / pool timeouts)
@github-actions
Copy link
Copy Markdown

/gemini review

1 similar comment
@majiayu000
Copy link
Copy Markdown
Owner Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b981475d1f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +345 to +350
let mut by_id: HashMap<TaskId, String> = self
.db
.list_external_ids_by_project_and_statuses(project_id, statuses)
.await?
.into_iter()
.map(|(id, external_id)| (harness_core::types::TaskId(id), external_id))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Key completed-issue merge by external_id, not task id

This merge is keyed by TaskId, so historical terminal attempts and newer retries for the same issue are treated as separate records. If issue N previously failed/done and is retried (now non-terminal) under a new task ID, completed_issue_numbers_for_project still sees N as completed and normalize_sprint_tasks can drop dependencies on it, allowing downstream work to start before the retried upstream actually finishes. This is reachable whenever an issue is re-run after a prior terminal attempt; dedupe/override should be based on external_id (or by excluding IDs that currently have non-terminal tasks).

Useful? React with 👍 / 👎.

…dencies

Dependency normalization now reads the latest task status per project, repo, and external_id so intake no longer treats another repo's issue number or an older terminal attempt as proof that an upstream is complete. The lookup stays lightweight by querying only external_id/status metadata, and the regression test covers failed tasks, repo scoping, and retry override semantics in one serialized case.

The lockfile is also updated to rustls-webpki 0.103.13 so the Security Audit check clears the newly published advisory that was failing this PR.

Constraint: Sprint normalization must preserve existing planner output semantics while avoiding full history scans
Rejected: Key cache merges by task ID | stale terminal attempts can outvote newer retries for the same external_id
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Keep completed-upstream lookups scoped to project+repo and driven by the latest external_id attempt
Tested: cargo test -p harness-server completed_issue_numbers_for_project -- --nocapture
Tested: RUSTFLAGS='-Dwarnings' cargo check --workspace --all-targets
Tested: cargo audit --json --file Cargo.lock
Not-tested: cargo test --workspace (fails in unrelated harness-observe event_store tests with pool timeout / DbHandler errors)
@github-actions
Copy link
Copy Markdown

/gemini review

1 similar comment
@majiayu000
Copy link
Copy Markdown
Owner Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(intake): recover when sprint planner emits dependency outside sprint plan

1 participant