Skip to content

Expand SKILL.md to use available investigation tools (tool-agnostic) #8

@BraedenBDev

Description

@BraedenBDev

Context

Today SKILL.md declares allowed-tools: Bash, Read, Write — just enough to run the shell scripts and write output. Claude's role is effectively "parse the JSON, write the narrative."

But Claude Code (and Codex, Gemini CLI, and other agent harnesses) ship with a much richer investigation toolkit that varies per user: web search, browser automation, documentation lookup, multi-page crawl, etc. Different users have different MCPs, plugins, and custom skills installed. Our skill should be tool-agnostic and opportunistic — it should describe the kinds of investigation worth doing and let the agent decide which of its installed tools fits each kind.

A good reference point: when an independent Codex-run audit of almostimpossible.agency found soft-404s on old work slugs, it didn't just parse one set of JSON files — it used web search to confirm stale URLs were still indexed, followed redirect chains interactively with curl -I -L, and cross-checked patterns against live HTML of multiple pages. That's the kind of investigation our skill should be performing.

Principle

Scripts are the evidence. The agent's tools are the investigation.

The shell scripts in scripts/ produce the raw, deterministic evidence (HTTP status, word counts, extracted meta, JSON-LD types). The agent's other tools — whatever is available in the current session — are for investigating what the evidence means, confirming hypotheses, and broadening the audit when something looks off.

Proposal

Rewrite the SKILL.md "Interpretation Guidelines" and "Error Handling" sections to:

  1. Describe categories of investigation tools, not specific tool names. The skill should never hardcode MCP server names, plugin names, or specific tool IDs. Users have different stacks.

  2. List categories the skill should look for and opportunistically use:

    • Web search — to check whether stale URLs are still indexed, find reputation signals, discover old slugs that need redirects, verify whether a bot identity is real
    • Web fetch / content extraction — to grab related pages the user didn't explicitly ask about (e.g., if they audit /work, also sample /work/* children to check internal link hygiene)
    • Browser automation — for real JS-rendered comparisons beyond our optional diff-render.sh, interaction flows bots don't execute, and visual regression
    • Multi-page discovery / site mapping — to find pages not in the sitemap or orphaned from the nav graph, without rebuilding BFS in bash
    • Documentation lookup — for framework-specific diagnosis (e.g., "is this a Next.js App Router SSR issue or a client-component hydration boundary?")
  3. Add a discovery step: the skill should probe its own tool list at the start of a run and note which investigation categories are available. If web search is available, use it to cross-check stale URLs. If browser automation is available, prefer it over the optional Playwright fallback. If none of them are available, the skill runs in "evidence-only" mode and the narrative acknowledges the limited scope.

  4. Safety boundary (important): the skill should use investigation tools for read-only investigation only. It must NOT:

    • Send data off-box without explicit user consent
    • Call any tool whose name or description suggests modification, deployment, or posting
    • Make follow-up requests at a rate that looks like abuse

    Decision rule: "if the tool reads, it's fair game with user awareness; if it writes/sends/posts, skip unless the user explicitly asked."

  5. Transparency in output: the narrative should say which investigation tools it used and what they contributed. Example:

    "Using web search, I confirmed two of the old slugs are still indexed on Google as of today. Using a browser automation tool, I rendered the page and compared against the server HTML — the hero section is server-rendered but the testimonials list is client-only."

    Users need to know which findings are backed by live investigation vs. script output alone.

Implementation sketch

Frontmatter stays open:
```yaml
allowed-tools: '*'
```

Then a new "Investigation" section in the body:

```markdown

Investigation (after the scripted pipeline)

After the shell scripts complete, check what additional tools you have available in this session and use them to deepen the audit. Do NOT hardcode expectations about specific tool names — different users have different MCPs, plugins, and skills installed.

Look for tools in these categories and use whatever is available:

  1. Web search — cross-check whether URLs you found broken/stale are still visible in search results, confirm bot identities, find reputation signals
  2. Web fetch / content extraction — sample related pages beyond the single target URL to test internal link hygiene at the site level
  3. Browser automation — run a real-browser render for JS-heavy sites when diff-render.sh skipped; inspect post-hydration DOM
  4. Multi-page discovery — list URLs on the site to compare against the sitemap; find orphans
  5. Documentation lookup — for framework-specific diagnosis, look up current docs (not training-cutoff docs)

Use investigation tools only for reading. Do not modify anything, post anything, or send data to third-party services without explicit user consent.

Report transparently in the narrative which investigation tools you used and what they contributed.
```

Acceptance criteria

  • SKILL.md frontmatter opens allowed-tools (either * or a broad allow list — document the safety boundary in the body)
  • New "Investigation" section in SKILL.md that describes categories of tools, not specific names
  • The skill probes the session's tool list at the start and notes what's available in the narrative output
  • The narrative output explicitly credits which investigation tools contributed to which findings
  • README documents the principle: "scripts are evidence, agent tools are investigation"
  • Safety rules are documented: read-only, transparent, no off-box data without consent
  • Works on a minimal Claude Code install with only Bash/Read/Write — degrades gracefully to "evidence-only" mode

Out of scope

  • Hardcoding any specific MCP, plugin, or skill name into SKILL.md
  • Requiring any particular tool to be installed
  • Modifying the shell scripts (they stay self-contained; this is a skill-level change)

Prior art

The Codex-run audit session that inspired this change combined curl, ripgrep, Python BFS, and web search in a single investigation — including doing an indirect cross-check via site: search to confirm whether stale URLs were still indexed. That's the bar this issue aims at: a skill that uses whatever is in the current session to confirm hypotheses rather than just parsing the scripted evidence.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions