Formal conflict resolution for MCP tool governance using Dung's Argumentation Framework — implementation notes

### Summary

I built an open-source OPA-gated MCP server ([Claw](https://github.com/Leeladitya/claw)) that adds a formal conflict resolution layer to Claude's tool use pipeline. When multiple policy rules produce contradictory decisions (e.g., "deny — PHI detected" vs. "allow — authorized research domain"), the system resolves conflicts using Dung's Abstract Argumentation Framework (1995) instead of hardcoded priority ordering.

I'm sharing implementation notes in case they're useful to others building governance layers for MCP tool use, and to see if there's interest in a cookbook recipe covering this pattern.

### The problem this addresses

MCP servers that enforce behavioral constraints typically stack multiple checks: PII detection, domain reputation, policy packs (HIPAA, financial compliance), and contextual signals. These checks regularly conflict. Most implementations resolve conflicts with priority numbers or "deny wins" logic. This works until:

- A trusted research domain sends content containing patient identifiers — does the trust record override the PII block, or does the PII block override the trust record?
- A healthcare policy pack denies content that a research policy pack would allow — which pack wins when the user has both contexts?
- A domain was previously flagged suspicious, but the OPA policy says the content type is safe — which signal dominates?

These aren't edge cases. They're the normal operating mode when you have more than 3-4 policy rules.

### How the argumentation approach works

Instead of priority ordering, each policy signal becomes a formal argument with a source, strength, and decision claim. Arguments that contradict each other attack each other in a directed graph. The engine then computes which arguments survive all attacks using Dung's characteristic function (iterative fixpoint):

```
S₀ = ∅
Sₙ₊₁ = F(Sₙ) = { a ∈ Args | framework defends a w.r.t. Sₙ }
Stop when Sₙ₊₁ = Sₙ
```

The surviving arguments (the "grounded extension") determine the decision. The full attack graph is returned as part of the API response, so every decision is traceable.

### Concrete example

Input: content containing 2 email addresses from a domain previously marked trusted.

**Arguments generated:**

| ID | Claim | Source | Strength | Decision |
|----|-------|--------|----------|----------|
| baseline_allow | Content should be allowed | OPA default | 0.3 | allow |
| pii_moderate | 2 non-critical PII items — masking required | PII Scanner | 0.6 | modify |
| knowledge_trust_0 | Domain previously marked trusted | Knowledge Hub | 0.8 | allow |

**Attack relations:**
- `pii_moderate → baseline_allow` (UNDERCUT: modification overrides baseline)
- `knowledge_trust_0 → pii_moderate` (UNDERMINE: trust record challenges PII block)

**Grounded extension:** `{baseline_allow, knowledge_trust_0}` — the trust record defends the allow decision because it attacks the only argument attacking the baseline.

**Decision:** allow (no modification needed)

If the same content came from an untrusted domain, the `knowledge_trust_0` argument wouldn't exist, the grounded extension would be `{pii_moderate}`, and the decision would be `allow_with_modifications` (PII masking applied).

The point is: the same engine handles both cases through graph computation, not conditional branching. And the full graph is available for audit.

### Implementation details

The system runs as a 6-stage pipeline:

```
PII Scan → OPA Policy Gate → Knowledge Hub → Argumentation → Context Assembly → Model
```

**Key files in the repo:**
- `server/argumentation/models.py` — Dung's AF data structures (Argument, Attack, Extension, Framework)
- `server/argumentation/engine.py` — Extension computation (grounded, preferred, stable semantics)
- `server/argumentation/rego_bridge.py` — Converts OPA decisions + Knowledge Hub entries into formal arguments
- `opa/policies/main.rego` — 12 deny rules, 5 modification rules across 6 policy packs
- `server/sdam_model.py` — Sequential Decision Analytics (Powell's framework) for modeling decision chains

Numbers: 65 Python tests, 20 OPA tests, 9 API endpoints, Docker deployment.

### Why this might be useful for the cookbook

The MCP ecosystem is growing fast, and governance is becoming a bottleneck. A cookbook recipe could cover:

- **Pattern:** How to add a formal conflict resolution layer between OPA and the model
- **When to use it:** Any MCP server with >3 policy rules that can produce contradictory outputs
- **The bridge pattern:** Converting declarative policy outputs (Rego) into formal arguments
- **Auditability:** Returning the attack graph as part of the API response for compliance

I'm happy to write this up as a proper cookbook recipe if there's interest, or to adapt any part of the existing code for inclusion.

### Related work

- Dung, P.M. (1995). "On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games." *Artificial Intelligence*, 77(2), 321-357.
- Bai et al. (2022). "Constitutional AI: Harmlessness from AI feedback." — the system extends this by formalizing how constitutional principles resolve when they contradict.
- Kirchner et al. (2024). "Prover-Verifier Games improve legibility of LLM outputs." — structurally related: both formalize the idea that AI reasoning should be checkable by a less capable agent.

### Links

- **Repo:** [github.com/Leeladitya/claw](https://github.com/Leeladitya/claw)
- **Decision Arena (playable governance scenarios):** [github.com/Leeladitya/agora](https://github.com/Leeladitya/agora)
- **Research paper (multi-constitutional extension):** [leed.guru](https://leed.guru/)
- - **Extended technical writeup with formal analysis:** [lesswrong.com/posts/89bYQbNrRN9a8htpr](https://www.lesswrong.com/posts/89bYQbNrRN9a8htpr/argumentation-frameworks-as-auditable-conflict-resolution)

Feedback and pointers to related MCP governance work are welcome.

— Leela Aditya Annam (@Leeladitya)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Formal conflict resolution for MCP tool governance using Dung's Argumentation Framework — implementation notes #385

Summary

The problem this addresses

How the argumentation approach works

Concrete example

Implementation details

Why this might be useful for the cookbook

Related work

Links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ID	Claim	Source	Strength	Decision
baseline_allow	Content should be allowed	OPA default	0.3	allow
pii_moderate	2 non-critical PII items — masking required	PII Scanner	0.6	modify
knowledge_trust_0	Domain previously marked trusted	Knowledge Hub	0.8	allow

Formal conflict resolution for MCP tool governance using Dung's Argumentation Framework — implementation notes #385

Description

Summary

The problem this addresses

How the argumentation approach works

Concrete example

Implementation details

Why this might be useful for the cookbook

Related work

Links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions