-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
Description
Problem
We want continuous self-improvement across UOS plugins without bloating the kernel. Today, the kernel logs plugin dispatch failures and skips workflow/check events to avoid loops, so a dedicated hotfix plugin cannot react to those failures.
Goals
- Route actionable failure signals to a dedicated plugin (
daemon-hotfix) with minimal kernel changes. - Create/update issues in the appropriate plugin repo with enough context to reproduce.
- Optionally propose fixes (PRs) with tests, while remaining safe by default.
- Keep kernel lean: routing + policy gates only.
Non-goals
- No fully autonomous merges by default.
- No attempt to modify UOS kernel logic beyond minimal event plumbing.
- No brittle keyword/regex-only tool selection for AI-driven decisions.
High-level architecture
- Kernel detects plugin dispatch failures (HTTP worker or GitHub Action dispatch) and emits a synthetic event.
daemon-hotfixsubscribes to that synthetic event.daemon-hotfixtriages, de-dupes, files issues, and optionally drafts PRs.- PRs go through normal CI; auto-merge only if explicitly enabled and risk tier is low.
Minimal kernel changes (proposed)
-
Synthetic failure event
- Emit
kernel.plugin_error(or similar) when a plugin dispatch fails. - Include enough metadata to find the failing plugin repo and reproduce.
- Emit
-
Optional workflow/check allowlist
- Today
workflow_*,check_*,check_run.*,check_suite.*are skipped to prevent loops. - Add an allowlist so only
daemon-hotfixcan receive these events if configured. - This enables detection of failed workflow runs in plugin repos (e.g.,
compute.ymlfailures).
- Today
-
Safety guardrails
- Rate-limit synthetic events per repo.
- Suppress events caused by the hotfix plugin itself to prevent loops.
Failure event payload (draft)
{
"event": "kernel.plugin_error",
"timestamp": "2025-01-01T00:00:00Z",
"stateId": "<kernel state id>",
"source": {
"kernelRepo": "ubiquity-os/ubiquity-os-kernel",
"environment": "prod|dev"
},
"trigger": {
"githubEvent": "issue_comment.created",
"deliveryId": "<github delivery id>",
"repo": "owner/repo",
"issueOrPr": 123,
"actor": "octocat"
},
"plugin": {
"type": "http|workflow",
"id": "owner/repo:workflow.yml@ref OR https://plugin.url",
"owner": "owner",
"repo": "repo",
"workflowId": "compute.yml",
"ref": "main",
"manifest": {
"name": "...",
"version": "..."
}
},
"error": {
"message": "HTTP 502: ...",
"status": 502,
"stack": "...",
"category": "dispatch|http|workflow|timeout|validation",
"raw": "..."
},
"context": {
"requestId": "...",
"pluginInputs": {"redacted": true},
"responseSnippet": "...",
"retryCount": 0
}
}
daemon-hotfix behavior (MVP)
-
Triage & De-dupe
- Cluster by
(plugin.id + error.category + error.message hash)within a time window. - Reopen existing issue if the same error recurs.
- Cluster by
-
Issue creation
- Open a GitHub issue in the plugin repo with:
- Summary of failure + timestamp + correlation id
- Link to triggering issue/PR
- Sanitized error details + repro steps (event replay, if available)
- Suggested next actions (test/trace)
- Open a GitHub issue in the plugin repo with:
-
Optional auto-fix (guarded)
- If configured and risk tier is low, open a PR with:
- Minimal patch
- Regression test or replay fixture
- Reasoning summary + risk notes
- Never auto-merge by default.
- If configured and risk tier is low, open a PR with:
Safety, permissions, and loops
-
Permission model options
- Run hotfix workflow inside the target plugin repo (simplest for auth).
- Central hotfix service that mints installation tokens per repo (more powerful but riskier).
-
Loop prevention
- If a failure originates from
daemon-hotfix, ignore it. - If a failure is already under investigation (issue open), update issue instead of spamming.
- If a failure originates from
-
Redaction
- Strip secrets/tokens from logs and payloads.
- Limit stored raw event payloads unless opt-in.
Configuration (example)
plugins:
ubiquity-os-marketplace/daemon-hotfix:
with:
enabled: true
modes: ["issue", "pr"]
riskPolicy:
autoMerge: false
maxAutofixPerDay: 3
issueLabels: ["bug", "autofix"]
notify: ["@team-bot"]
allowlistRepos: ["ubiquity-os-marketplace/command-codex"]
MVP acceptance criteria
- A 5xx from an HTTP plugin dispatch results in a single issue in that plugin repo.
- A failed workflow run (if allowlisted) results in a single issue in that repo.
- Duplicate failures update the existing issue instead of creating new ones.
- No infinite loops or noisy spam during normal operations.
Open questions
- Should the kernel emit
kernel.plugin_errorfor both HTTP and workflow dispatch failures, or only HTTP? - Should the hotfix plugin support optional local reproduction with cached webhook payloads?
- Do we want a global dashboard for error rates per plugin?
Reactions are currently unavailable