Skip to content

Spec: daemon-hotfix self-improvement plugin #81

@0x4007

Description

@0x4007

Problem

We want continuous self-improvement across UOS plugins without bloating the kernel. Today, the kernel logs plugin dispatch failures and skips workflow/check events to avoid loops, so a dedicated hotfix plugin cannot react to those failures.

Goals

  • Route actionable failure signals to a dedicated plugin (daemon-hotfix) with minimal kernel changes.
  • Create/update issues in the appropriate plugin repo with enough context to reproduce.
  • Optionally propose fixes (PRs) with tests, while remaining safe by default.
  • Keep kernel lean: routing + policy gates only.

Non-goals

  • No fully autonomous merges by default.
  • No attempt to modify UOS kernel logic beyond minimal event plumbing.
  • No brittle keyword/regex-only tool selection for AI-driven decisions.

High-level architecture

  1. Kernel detects plugin dispatch failures (HTTP worker or GitHub Action dispatch) and emits a synthetic event.
  2. daemon-hotfix subscribes to that synthetic event.
  3. daemon-hotfix triages, de-dupes, files issues, and optionally drafts PRs.
  4. PRs go through normal CI; auto-merge only if explicitly enabled and risk tier is low.

Minimal kernel changes (proposed)

  1. Synthetic failure event

    • Emit kernel.plugin_error (or similar) when a plugin dispatch fails.
    • Include enough metadata to find the failing plugin repo and reproduce.
  2. Optional workflow/check allowlist

    • Today workflow_*, check_*, check_run.*, check_suite.* are skipped to prevent loops.
    • Add an allowlist so only daemon-hotfix can receive these events if configured.
    • This enables detection of failed workflow runs in plugin repos (e.g., compute.yml failures).
  3. Safety guardrails

    • Rate-limit synthetic events per repo.
    • Suppress events caused by the hotfix plugin itself to prevent loops.

Failure event payload (draft)

{
  "event": "kernel.plugin_error",
  "timestamp": "2025-01-01T00:00:00Z",
  "stateId": "<kernel state id>",
  "source": {
    "kernelRepo": "ubiquity-os/ubiquity-os-kernel",
    "environment": "prod|dev"
  },
  "trigger": {
    "githubEvent": "issue_comment.created",
    "deliveryId": "<github delivery id>",
    "repo": "owner/repo",
    "issueOrPr": 123,
    "actor": "octocat"
  },
  "plugin": {
    "type": "http|workflow",
    "id": "owner/repo:workflow.yml@ref OR https://plugin.url",
    "owner": "owner",
    "repo": "repo",
    "workflowId": "compute.yml",
    "ref": "main",
    "manifest": {
      "name": "...",
      "version": "..."
    }
  },
  "error": {
    "message": "HTTP 502: ...",
    "status": 502,
    "stack": "...",
    "category": "dispatch|http|workflow|timeout|validation",
    "raw": "..." 
  },
  "context": {
    "requestId": "...",
    "pluginInputs": {"redacted": true},
    "responseSnippet": "...",
    "retryCount": 0
  }
}

daemon-hotfix behavior (MVP)

  • Triage & De-dupe

    • Cluster by (plugin.id + error.category + error.message hash) within a time window.
    • Reopen existing issue if the same error recurs.
  • Issue creation

    • Open a GitHub issue in the plugin repo with:
      • Summary of failure + timestamp + correlation id
      • Link to triggering issue/PR
      • Sanitized error details + repro steps (event replay, if available)
      • Suggested next actions (test/trace)
  • Optional auto-fix (guarded)

    • If configured and risk tier is low, open a PR with:
      • Minimal patch
      • Regression test or replay fixture
      • Reasoning summary + risk notes
    • Never auto-merge by default.

Safety, permissions, and loops

  • Permission model options

    1. Run hotfix workflow inside the target plugin repo (simplest for auth).
    2. Central hotfix service that mints installation tokens per repo (more powerful but riskier).
  • Loop prevention

    • If a failure originates from daemon-hotfix, ignore it.
    • If a failure is already under investigation (issue open), update issue instead of spamming.
  • Redaction

    • Strip secrets/tokens from logs and payloads.
    • Limit stored raw event payloads unless opt-in.

Configuration (example)

plugins:
  ubiquity-os-marketplace/daemon-hotfix:
    with:
      enabled: true
      modes: ["issue", "pr"]
      riskPolicy:
        autoMerge: false
        maxAutofixPerDay: 3
      issueLabels: ["bug", "autofix"]
      notify: ["@team-bot"]
      allowlistRepos: ["ubiquity-os-marketplace/command-codex"]

MVP acceptance criteria

  • A 5xx from an HTTP plugin dispatch results in a single issue in that plugin repo.
  • A failed workflow run (if allowlisted) results in a single issue in that repo.
  • Duplicate failures update the existing issue instead of creating new ones.
  • No infinite loops or noisy spam during normal operations.

Open questions

  • Should the kernel emit kernel.plugin_error for both HTTP and workflow dispatch failures, or only HTTP?
  • Should the hotfix plugin support optional local reproduction with cached webhook payloads?
  • Do we want a global dashboard for error rates per plugin?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions