Skip to content

Formal conflict resolution for MCP tool governance using Dung's Argumentation Framework — implementation notes #385

@Leeladitya

Description

@Leeladitya

Summary

I built an open-source OPA-gated MCP server (Claw) that adds a formal conflict resolution layer to Claude's tool use pipeline. When multiple policy rules produce contradictory decisions (e.g., "deny — PHI detected" vs. "allow — authorized research domain"), the system resolves conflicts using Dung's Abstract Argumentation Framework (1995) instead of hardcoded priority ordering.

I'm sharing implementation notes in case they're useful to others building governance layers for MCP tool use, and to see if there's interest in a cookbook recipe covering this pattern.

The problem this addresses

MCP servers that enforce behavioral constraints typically stack multiple checks: PII detection, domain reputation, policy packs (HIPAA, financial compliance), and contextual signals. These checks regularly conflict. Most implementations resolve conflicts with priority numbers or "deny wins" logic. This works until:

  • A trusted research domain sends content containing patient identifiers — does the trust record override the PII block, or does the PII block override the trust record?
  • A healthcare policy pack denies content that a research policy pack would allow — which pack wins when the user has both contexts?
  • A domain was previously flagged suspicious, but the OPA policy says the content type is safe — which signal dominates?

These aren't edge cases. They're the normal operating mode when you have more than 3-4 policy rules.

How the argumentation approach works

Instead of priority ordering, each policy signal becomes a formal argument with a source, strength, and decision claim. Arguments that contradict each other attack each other in a directed graph. The engine then computes which arguments survive all attacks using Dung's characteristic function (iterative fixpoint):

S₀ = ∅
Sₙ₊₁ = F(Sₙ) = { a ∈ Args | framework defends a w.r.t. Sₙ }
Stop when Sₙ₊₁ = Sₙ

The surviving arguments (the "grounded extension") determine the decision. The full attack graph is returned as part of the API response, so every decision is traceable.

Concrete example

Input: content containing 2 email addresses from a domain previously marked trusted.

Arguments generated:

ID Claim Source Strength Decision
baseline_allow Content should be allowed OPA default 0.3 allow
pii_moderate 2 non-critical PII items — masking required PII Scanner 0.6 modify
knowledge_trust_0 Domain previously marked trusted Knowledge Hub 0.8 allow

Attack relations:

  • pii_moderate → baseline_allow (UNDERCUT: modification overrides baseline)
  • knowledge_trust_0 → pii_moderate (UNDERMINE: trust record challenges PII block)

Grounded extension: {baseline_allow, knowledge_trust_0} — the trust record defends the allow decision because it attacks the only argument attacking the baseline.

Decision: allow (no modification needed)

If the same content came from an untrusted domain, the knowledge_trust_0 argument wouldn't exist, the grounded extension would be {pii_moderate}, and the decision would be allow_with_modifications (PII masking applied).

The point is: the same engine handles both cases through graph computation, not conditional branching. And the full graph is available for audit.

Implementation details

The system runs as a 6-stage pipeline:

PII Scan → OPA Policy Gate → Knowledge Hub → Argumentation → Context Assembly → Model

Key files in the repo:

  • server/argumentation/models.py — Dung's AF data structures (Argument, Attack, Extension, Framework)
  • server/argumentation/engine.py — Extension computation (grounded, preferred, stable semantics)
  • server/argumentation/rego_bridge.py — Converts OPA decisions + Knowledge Hub entries into formal arguments
  • opa/policies/main.rego — 12 deny rules, 5 modification rules across 6 policy packs
  • server/sdam_model.py — Sequential Decision Analytics (Powell's framework) for modeling decision chains

Numbers: 65 Python tests, 20 OPA tests, 9 API endpoints, Docker deployment.

Why this might be useful for the cookbook

The MCP ecosystem is growing fast, and governance is becoming a bottleneck. A cookbook recipe could cover:

  • Pattern: How to add a formal conflict resolution layer between OPA and the model
  • When to use it: Any MCP server with >3 policy rules that can produce contradictory outputs
  • The bridge pattern: Converting declarative policy outputs (Rego) into formal arguments
  • Auditability: Returning the attack graph as part of the API response for compliance

I'm happy to write this up as a proper cookbook recipe if there's interest, or to adapt any part of the existing code for inclusion.

Related work

  • Dung, P.M. (1995). "On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games." Artificial Intelligence, 77(2), 321-357.
  • Bai et al. (2022). "Constitutional AI: Harmlessness from AI feedback." — the system extends this by formalizing how constitutional principles resolve when they contradict.
  • Kirchner et al. (2024). "Prover-Verifier Games improve legibility of LLM outputs." — structurally related: both formalize the idea that AI reasoning should be checkable by a less capable agent.

Links

Feedback and pointers to related MCP governance work are welcome.

— Leela Aditya Annam (@Leeladitya)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions