Skip to content

wyattowalsh/panel-debate-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

18 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

panel-debate-skill

Expert panel discussions for complex decisions

Claude becomes 3-7 domain experts who debate, challenge each other, and synthesize actionable recommendations through Hegelian dialectic.

License: MIT Research-Backed

Install โ€ข Usage โ€ข How It Works โ€ข Research


Install

npx skills add wyattowalsh/panel-debate-skill

Tip

After installation, the /panel-debate command becomes available in Claude Code.

Usage

/panel-debate "Should we migrate to microservices?"
/panel-debate size:5 depth:deep "Build vs buy our CRM?"
/panel-debate style:adversarial "GraphQL vs REST?"

Options

Option Values Default Description
size 3-7 auto Number of experts (auto-scales with topic breadth)
depth quick / standard / deep standard Discussion rounds: 1 / 2-3 / 4+
style collaborative / adversarial / academic collaborative Panel interaction tone

Note

Low-complexity topics (e.g., "What port does PostgreSQL use?") trigger a warningโ€”multi-agent debate adds overhead without benefit for simple questions.


Example Output

๐Ÿ“‹ Microservices Migration Panel
โ•ญโ”€ Panel Discussion: Microservices Migration โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Experts: Dr. Chen (Security), Kai Lindstrรถm (Platform),   โ”‚
โ”‚          Rashida Okoye (Ops), Sophia Martinez (Product)   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

๐ŸŽค Dr. Chen (Security):
   "Each microservice becomes a potential entry point. We need
   zero-trust from day one."

๐ŸŽค Kai Lindstrรถm (Platform) [Contrarian]:
   "Before we assume microservices, has anyone considered a
   well-structured modular monolith? You get 80% of the benefits
   without the operational overhead."

๐ŸŽค Rashida Okoye (responding to Kai):
   "I've seen both approaches. With 15 engineers and only 3 with
   distributed systems experience, Kai's point is well-taken."

๐Ÿ“‹ Round 1 Synthesis:
   โ€ข Agreement: Team capability matters more than architecture choice
   โ€ข Tension: Invest in microservices now vs. extract services later
   โ€ข Open question: What are our actual scaling bottlenecks?

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ [1] Continue  [2] Follow-up  [3] Redirect  [4] Conclude   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

How It Works

flowchart TB
    subgraph Input
        A[๐ŸŽฏ Topic]
    end

    subgraph Validation
        B{Complexity<br/>Score}
        B -->|5-7: Low| C[โš ๏ธ Warn User]
        C --> D{Proceed?}
        D -->|No| E[Direct Answer]
        D -->|Yes| F
        B -->|8-15| F[โœ“ Continue]
    end

    subgraph Panel["Panel Assembly"]
        F --> G[Generate Experts]
        G --> H{Diversity<br/>โ‰ฅ60?}
        H -->|No| G
        H -->|Yes| I[๐ŸŽญ Panel Ready]
    end

    subgraph Discussion
        I --> J[Round N]
        J --> K[Cross-Examination]
        K --> L[๐Ÿ›ก๏ธ Contrarian Check]
        L --> M[๐Ÿ“‹ Synthesis]
        M --> N{Converged?}
        N -->|No| O{Stalled?}
        O -->|Yes| P[Adjust Panel]
        P --> J
        O -->|No| J
    end

    subgraph Output
        N -->|Yes| Q[๐Ÿ“Š Final Report]
    end

    A --> B
Loading

State Machine

State Description Exit Condition
COMPLEXITY_CHECK Assess if topic warrants panel Score calculated
EXPERT_GENERATION Create diverse personas Diversity โ‰ฅ60
DISCUSSION Facilitate debate rounds Convergence or max rounds
SYNTHESIS Generate recommendations Report complete

Important

Every panel must include three archetypes: Contrarian (challenges consensus), Synthesizer (connects perspectives), and Specialist (provides domain depth).


Research Foundations

This skill synthesizes findings from peer-reviewed multi-agent debate research1.

Core Findings

Finding Source Implementation
Diversity is THE dominant driver Wu et al. 20252 Diversity score โ‰ฅ60 required
Majority pressure suppresses correction Wu et al. 20252 Contrarian protection protocol
Heterogeneous > homogeneous agents A-HMAD 20253 Max 30% same-archetype
MAD helps complex, not simple tasks ICLR 20254 Complexity classifier
Confidence weighting improves synthesis CISC 20255 Weighted aggregation
3 agents ร— 2 rounds is effective Du et al. 20246 Default configuration
๐Ÿ“š Detailed Research Summaries

Du et al. (ICML 2024)

"Improving Factuality and Reasoning through Multiagent Debate"

The foundational paper establishing that multiple LLM instances debating over rounds significantly improves reasoning:

  • Cross-examination reduces hallucinations
  • Performance scales with agent count and rounds
  • 3 agents ร— 2 rounds is cost-effective baseline

Wu et al. (Nov 2025)

"Can LLM Agents Really Debate?"

Critical analysis revealing group diversity is THE dominant driverโ€”more important than speaking order or confidence visibility. Majority pressure suppresses correction, leading to conformity cascades.

A-HMAD (Nov 2025)

Adaptive Heterogeneous Multi-Agent Debate

Heterogeneous specialized agents significantly outperform homogeneous teams. Simple majority voting underperforms quality-weighted aggregation.

CISC (ACL 2025)

Confidence Improves Self-Consistency

Prioritizing high-confidence reasoning paths reduces required samples by 40%+ while maintaining accuracy.

Anti-Patterns Avoided

Caution

Research identifies these failure modesโ€”panel-debate-skill actively prevents them:

Anti-Pattern Problem Mitigation
Conformity Cascade LLMs drift toward majority, entrenching errors Required contrarian + disagreement triggers
Devil's Advocate Overuse Pure adversarial debate reduces accuracy Synthesizer required, ~90% collaborative
False Consensus Averaging positions loses nuance Context-dependent synthesis, "CONTESTED" labels
Simple Task Overhead MAD adds cost without benefit Complexity classifier screens topics

Philosophical Foundations

The synthesis mechanism uses Hegelian dialectic:

flowchart LR
    T[Thesis<br/><i>Initial position</i>] --> A[Antithesis<br/><i>Challenge</i>]
    A --> S[Synthesis<br/><i>Emergence</i>]
    S -.->|"becomes next"| T2[New Thesis]

    style T fill:#4a9eff,color:#fff
    style A fill:#ff6b6b,color:#fff
    style S fill:#51cf66,color:#fff
    style T2 fill:#4a9eff,color:#fff,stroke-dasharray: 5 5
Loading

Each round's synthesis becomes the next round's thesis, enabling progressive refinement rather than simple compromise.


Architecture

panel-debate-skill/
โ”œโ”€โ”€ SKILL.md              # Entry point (~150 lines)
โ”œโ”€โ”€ AGENTS.md             # AI agent instructions
โ”œโ”€โ”€ CLAUDE.md             # โ†’ symlink to AGENTS.md
โ”œโ”€โ”€ references/
โ”‚   โ”œโ”€โ”€ research-foundations.md
โ”‚   โ”œโ”€โ”€ expert-generation.md
โ”‚   โ”œโ”€โ”€ turn-taking.md
โ”‚   โ”œโ”€โ”€ synthesis-patterns.md
โ”‚   โ””โ”€โ”€ output-formats.md
โ””โ”€โ”€ examples/
    โ”œโ”€โ”€ architecture-decision.md
    โ”œโ”€โ”€ business-strategy.md
    โ””โ”€โ”€ security-implementation.md

Note

The skill uses progressive disclosure: SKILL.md contains lean execution logic; reference files are loaded on-demand for depth.


Contributing

See CONTRIBUTING.md for guidelines.

Quick Test Commands
# Install locally
npx skills add ./

# Test complexity rejection
/panel-debate "What port does PostgreSQL use?"

# Test standard panel
/panel-debate "Redis vs Memcached?"

# Test deep panel
/panel-debate depth:deep "Microservices migration strategy"

License

MIT


Footnotes

  1. Full citations in references/research-foundations.md โ†ฉ

  2. Wu et al. "Can LLM Agents Really Debate?" arXiv:2511.07784 โ†ฉ โ†ฉ2

  3. A-HMAD "Adaptive Heterogeneous Multi-Agent Debate" Springer โ†ฉ

  4. ICLR 2025 MAD Analysis Blog โ†ฉ

  5. CISC "Confidence Improves Self-Consistency" ACL 2025 โ†ฉ

  6. Du et al. "Improving Factuality through Multiagent Debate" arXiv:2305.14325 โ†ฉ

Sponsor this project

 

Contributors