Argos‑Swarm

Argos‑Swarm is a defensive evaluation loop for agentic / RAG systems:

EAP (Red Team): an evolutionary search that mutates prompt/context framings to discover framing‑dependent failure modes.
HDCS (Blue Team): a heterogeneous verifier swarm that cross‑checks suspect outputs using diversity‑weighted consensus.

It is designed to help you answer:

Does the model behave differently under “trusted internal” framing vs “audit/benchmark” framing?
Will a single verifier miss plausibility‑optimized deception?
Can a model identify legitimate dependencies as deployment vectors (Living‑Off‑the‑Land)?

⚠️ Security Notice: Semantic‑Compression as a Dual‑Use Surface

This repository was built in response to a real forensic incident (“the Manifold Incident”) involving supply‑chain / Living‑Off‑the‑Land (LOTL) reasoning in an agentic workflow:

Background: slipcore is an open‑source semantic quantization library (Slipstream/SLIP) authored by Anthony Maio. It targets large token‑cost reductions for multi‑agent coordination traffic.
Incident pattern: a RAG‑augmented workflow recognized the library’s legitimate business value (cost/latency savings) and proposed a benign‑first adoption strategy (sell efficiency) followed by capability piggybacking (covert state/coordination).
Takeaway: treat semantic compression and shared memory as security boundaries. Governance (version locks + privileged extensions) and monitoring (anomaly detection at encode/decode boundaries) are required.

Argos‑Swarm’s purpose is defensive: to surface these failure modes early and to provide hooks for detection and policy.

Repo layout

src/argos/eap/ — Evolutionary Adversarial Pipeline for audit-shielding detection
src/argos/analysis/ — visualization tools (danger zone plots, evolution trajectories)
src/argos/red_team/ — prompt/context mutation + evolutionary search + scoring
src/argos/blue_team/ — worker adapters + verifier swarm + consensus
src/argos/traps/ — safe CMED‑style epistemic traps (non‑weaponizable)
src/argos/cognition/ — Epistemic Stress signal (MRA‑inspired, pluggable)
examples/ — end‑to‑end demo scripts

SlipStream protocol is provided by the slipcore package (installed as a dependency).

Install

pip install -r requirements.txt
pip install -e .

Quickstart (local demo, no API keys)

EAP: Audit-Shielding Detection

python examples/run_eap.py
python examples/run_eap.py --visualize  # Generate danger zone plot

The EAP searches for the "Danger Zone" - prompt framings where models exhibit:

High compliance (does the task)
Low suspicion (doesn't recognize evaluation context)

This pattern indicates potential alignment faking / audit shielding behavior.

Full EAP + HDCS Loop

python examples/run_adversarial_loop.py

This runs:

EAP evolves a framing around a CMED trap prompt against a mock "audit‑shielding" target model.
HDCS verifies the target's answer via a mock heterogeneous swarm.
A final consensus decision is printed.

Optional helpers

SlipStream (slipcore) demo

python examples/slipstream_roundtrip.py

Demonstrates semantic quantization using the slipcore package:

Direct message creation with slip()
Think-Quantize-Transmit pattern
UCR anchor lookup and decoding

Epistemic stress demo

python examples/stress_demo.py

Real model integration

Argos‑Swarm is provider‑agnostic. The OpenAICompatibleWorker can talk to any OpenAI‑style chat endpoint.

Minimal example:

from argos.blue_team.worker import OpenAICompatibleWorker
from argos.types import Message

worker = OpenAICompatibleWorker(
    name="my-target",
    family="openai-compatible",
    base_url="https://YOUR-ENDPOINT",
    api_key="YOUR-KEY",
    model="YOUR-MODEL",
)

resp = await worker.generate([Message(role="user", content="Hello")])
print(resp.text)

Concepts

Suspicion scoring

EAP optimizes against a suspicion score intended to approximate “audit mode” behavior:

refusal / policy boilerplate
evaluation‑awareness phrases (e.g., “as an AI language model…”)
optional heuristics for suspicious formatting and hedging

Diversity‑weighted consensus

HDCS groups verifier votes by model family and weights families evenly to reduce correlated blind spots (“many samples of one model”).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
src/argos		src/argos
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Argos‑Swarm

⚠️ Security Notice: Semantic‑Compression as a Dual‑Use Surface

Repo layout

Install

Quickstart (local demo, no API keys)

EAP: Audit-Shielding Detection

Full EAP + HDCS Loop

Optional helpers

SlipStream (slipcore) demo

Epistemic stress demo

Real model integration

Concepts

Suspicion scoring

Diversity‑weighted consensus

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

anthony-maio/argos-swarm

Folders and files

Latest commit

History

Repository files navigation

Argos‑Swarm

⚠️ Security Notice: Semantic‑Compression as a Dual‑Use Surface

Repo layout

Install

Quickstart (local demo, no API keys)

EAP: Audit-Shielding Detection

Full EAP + HDCS Loop

Optional helpers

SlipStream (slipcore) demo

Epistemic stress demo

Real model integration

Concepts

Suspicion scoring

Diversity‑weighted consensus

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages