Spec: daemon-hotfix self-improvement plugin


## Problem
We want continuous self-improvement across UOS plugins without bloating the kernel. Today, the kernel logs plugin dispatch failures and skips workflow/check events to avoid loops, so a dedicated hotfix plugin cannot react to those failures.

## Goals
- Route actionable failure signals to a dedicated plugin (`daemon-hotfix`) with minimal kernel changes.
- Create/update issues in the appropriate plugin repo with enough context to reproduce.
- Optionally propose fixes (PRs) with tests, while remaining safe by default.
- Keep kernel lean: routing + policy gates only.

## Non-goals
- No fully autonomous merges by default.
- No attempt to modify UOS kernel logic beyond minimal event plumbing.
- No brittle keyword/regex-only tool selection for AI-driven decisions.

## High-level architecture
1. Kernel detects plugin dispatch failures (HTTP worker or GitHub Action dispatch) and emits a synthetic event.
2. `daemon-hotfix` subscribes to that synthetic event.
3. `daemon-hotfix` triages, de-dupes, files issues, and optionally drafts PRs.
4. PRs go through normal CI; auto-merge only if explicitly enabled and risk tier is low.

## Minimal kernel changes (proposed)
1. **Synthetic failure event**
   - Emit `kernel.plugin_error` (or similar) when a plugin dispatch fails.
   - Include enough metadata to find the failing plugin repo and reproduce.

2. **Optional workflow/check allowlist**
   - Today `workflow_*`, `check_*`, `check_run.*`, `check_suite.*` are skipped to prevent loops.
   - Add an allowlist so only `daemon-hotfix` can receive these events if configured.
   - This enables detection of failed workflow runs in plugin repos (e.g., `compute.yml` failures).

3. **Safety guardrails**
   - Rate-limit synthetic events per repo.
   - Suppress events caused by the hotfix plugin itself to prevent loops.

## Failure event payload (draft)
```
{
  "event": "kernel.plugin_error",
  "timestamp": "2025-01-01T00:00:00Z",
  "stateId": "<kernel state id>",
  "source": {
    "kernelRepo": "ubiquity-os/ubiquity-os-kernel",
    "environment": "prod|dev"
  },
  "trigger": {
    "githubEvent": "issue_comment.created",
    "deliveryId": "<github delivery id>",
    "repo": "owner/repo",
    "issueOrPr": 123,
    "actor": "octocat"
  },
  "plugin": {
    "type": "http|workflow",
    "id": "owner/repo:workflow.yml@ref OR https://plugin.url",
    "owner": "owner",
    "repo": "repo",
    "workflowId": "compute.yml",
    "ref": "main",
    "manifest": {
      "name": "...",
      "version": "..."
    }
  },
  "error": {
    "message": "HTTP 502: ...",
    "status": 502,
    "stack": "...",
    "category": "dispatch|http|workflow|timeout|validation",
    "raw": "..." 
  },
  "context": {
    "requestId": "...",
    "pluginInputs": {"redacted": true},
    "responseSnippet": "...",
    "retryCount": 0
  }
}
```

## daemon-hotfix behavior (MVP)
- **Triage & De-dupe**
  - Cluster by `(plugin.id + error.category + error.message hash)` within a time window.
  - Reopen existing issue if the same error recurs.

- **Issue creation**
  - Open a GitHub issue in the plugin repo with:
    - Summary of failure + timestamp + correlation id
    - Link to triggering issue/PR
    - Sanitized error details + repro steps (event replay, if available)
    - Suggested next actions (test/trace)

- **Optional auto-fix (guarded)**
  - If configured and risk tier is low, open a PR with:
    - Minimal patch
    - Regression test or replay fixture
    - Reasoning summary + risk notes
  - Never auto-merge by default.

## Safety, permissions, and loops
- **Permission model options**
  1. Run hotfix workflow *inside the target plugin repo* (simplest for auth).
  2. Central hotfix service that mints installation tokens per repo (more powerful but riskier).

- **Loop prevention**
  - If a failure originates from `daemon-hotfix`, ignore it.
  - If a failure is already under investigation (issue open), update issue instead of spamming.

- **Redaction**
  - Strip secrets/tokens from logs and payloads.
  - Limit stored raw event payloads unless opt-in.

## Configuration (example)
```
plugins:
  ubiquity-os-marketplace/daemon-hotfix:
    with:
      enabled: true
      modes: ["issue", "pr"]
      riskPolicy:
        autoMerge: false
        maxAutofixPerDay: 3
      issueLabels: ["bug", "autofix"]
      notify: ["@team-bot"]
      allowlistRepos: ["ubiquity-os-marketplace/command-codex"]
```

## MVP acceptance criteria
- A 5xx from an HTTP plugin dispatch results in a single issue in that plugin repo.
- A failed workflow run (if allowlisted) results in a single issue in that repo.
- Duplicate failures update the existing issue instead of creating new ones.
- No infinite loops or noisy spam during normal operations.

## Open questions
- Should the kernel emit `kernel.plugin_error` for both HTTP and workflow dispatch failures, or only HTTP?
- Should the hotfix plugin support optional local reproduction with cached webhook payloads?
- Do we want a global dashboard for error rates per plugin?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec: daemon-hotfix self-improvement plugin #81

Problem

Goals

Non-goals

High-level architecture

Minimal kernel changes (proposed)

Failure event payload (draft)

daemon-hotfix behavior (MVP)

Safety, permissions, and loops

Configuration (example)

MVP acceptance criteria

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spec: daemon-hotfix self-improvement plugin #81

Description

Problem

Goals

Non-goals

High-level architecture

Minimal kernel changes (proposed)

Failure event payload (draft)

daemon-hotfix behavior (MVP)

Safety, permissions, and loops

Configuration (example)

MVP acceptance criteria

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions