mcptest

Test your MCP server like you test the rest of your code.

Website · Documentation · Examples · Example servers

mcptest is an open-source CLI for testing Model Context Protocol servers. You write checks as YAML, point them at any MCP server, and run them from your terminal, your CI, or your coding agent.

A passing unit-test suite tells you your handler returns the right value. It tells you nothing about what your server puts on the wire: whether the initialize handshake completes, whether the tool catalog still says what you think it says, whether a tools/call over stdio or HTTP returns the response a client will actually see. mcptest speaks MCP end to end and checks exactly that. You get a deterministic pass or fail, and when something breaks, a structured failure that names the assertion, the payload the server sent, and a one-line repro.

Try it in three commands

curl -fsSL https://download.mcptest.sh/install.sh | sh   # or: brew install soapbucket/tap/mcptest
mcptest init   # writes a starter suite under tests/
mcptest run    # deterministic verdicts, structured failures

mcptest init scaffolds a suite that targets a built-in mock server (mcptest mock), so the first mcptest run passes offline with no real server and no network. Swap the command: for your own server when you are ready. If an MCP client on your machine already knows your server, mcptest init --from-discovered <name> scaffolds against it instead.

How it works

You describe the contract in YAML: a server, a call, and what the response should look like.

# mcptest.yml
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
servers:
  api:
    command: ["./your-server"]      # or: url: https://example.com/mcp
tools:
  - name: search returns a result for a known query
    server: api
    tool: search
    args: { query: "anthropic" }
    expect:
      - target: result.content[0].text
        matcher: { contains: "results for" }

mcptest run starts your server (or connects to its URL), performs the MCP initialize handshake over stdio, streamable HTTP, or legacy SSE, makes the call a real client would make, and checks the response against your assertions. It prints one line per check and exits 0 when everything holds, 1 when something breaks.

push  ->  mcptest run  ->  assert on the wire  ->  exit 0 / exit 1

When the server drifts, the same suite catches it. The failure names the assertion, shows what it expected against what the server sent, and exits non-zero so CI can gate on it.

One binary, the whole surface

The same engine covers the things teams otherwise test with one-off scripts. Each is a YAML block or a subcommand, and each exits with a code CI understands.

Tools, resources, prompts. Assert on real responses; catch catalog and input-schema drift.
The agent loop. Drive a real model across one or more servers and assert on the trace it produces (tool choice, arguments, tokens, cost).
Spec conformance. Grade a server against a pinned MCP protocol version (mcptest conformance run).
Schema drift. Diff the tool catalog against a baseline and classify each change as breaking or not (mcptest diff).
Security. Scan tool, prompt, and resource definitions for injection, exfiltration, and shadowing, and report as SARIF (mcptest security).
Offline replay. Record real exchanges to cassettes and replay them in CI with no keys and no spend.

Test the agent loop, replay it offline

Point one YAML test at one model or a list of them. mcptest lists the tools on every server you name, sends the prompt to the model with that catalog attached, dispatches the tool calls the model makes, and records the conversation. Your assertions resolve against the trace, so the same suite checks that the model picked the right tool and that the run stayed inside a token budget.

agents:
  - name: weather query routes to get_weather
    models: [claude-sonnet-4-5, gpt-5, gemini-2.5-pro]
    servers: [weather]
    prompt: What is the weather in Sacramento?
    expect:
      - target: tool_calls[0].name
        matcher: { exact: get_weather }
      - target: conversation.tokens.total
        matcher: { regex: "^[0-9]+$" }

Record once with your provider keys, and each (test, model) pair gets its own cassette. After that a plain mcptest run replays them in CI, deterministically, without spending a cent. Add a model identifier to models:, re-record, and the report tells you which assertion broke for which model.

Providers covered today: Anthropic, OpenAI (including the o-series), Google Gemini, Mistral, plus any OpenAI-compatible endpoint (Azure, OpenRouter, vLLM, llama.cpp, LiteLLM, Together, Groq, Bedrock-fronted Anthropic) through a named providers: block. Sweep a whole suite across models with mcptest run --models a,b,c and get a test-by-model grid. Background is in docs/models.md.

Use mcptest from your coding agent

mcptest ships an MCP server of its own, so Claude Code, Cursor, or any MCP-capable agent can run the full testing loop. Two commands hand it the keys:

mcptest mcp-server --install --enable-writes   # the agent-facing verbs
mcptest skill --install                        # the packaged skill

The agent scaffolds a validated suite from the server's real catalog, sharpens the generic checks against observed responses, runs the suite, and reads back a failure that already carries the assertion, the actual value, and a one-line repro. The agent brings the judgment; mcptest supplies the deterministic ground truth it cannot invent, and the YAML it leaves behind is the diffable audit trail a human reviews. See the agent interface for the verb reference and the model-facing --reporter agent output.

Run it in CI

The suite is a diffable YAML file you run on every commit. Reporters cover the formats CI already understands, and a single run writes machine-readable artifacts CI can store.

# .github/workflows/mcptest.yml
- name: Install mcptest
  run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=v1.0.0 sh
- name: Run mcptest
  run: mcptest run --reporter junit --output mcptest.junit.xml

Pick the reporter with --reporter: pretty (default), minimal, json, junit, md, html, sarif, gitlab, ndjson, tap, matrix, or quiet. Capture the JSON envelope once, then re-render it into any other format with mcptest report --format, with no second run and no second API call.

Why mcptest

Inspectors and one-off scripts tell you a server looked right once. A general eval framework grades the model that calls your tools, not the server on the other side of the call. mcptest is the part you can commit: one binary that checks the protocol contract, the behavior, the agent loop, schema drift, and tool-definition security, and turns each into a stable exit code. It is a single static binary with no telemetry and no auto-update, it is Apache-2.0, and it bakes a CycloneDX Software Bill of Materials into the binary so you can read the dependency list from the copy you already have.

mcptest sbom            # the embedded CycloneDX SBOM
mcptest sbom --verify   # re-hash it to catch tampering

Every release is Sigstore-signed and carries SLSA L3 build provenance. The full verification walkthrough lives at mcptest.sh/trust.

Install

Homebrew (macOS, Linux):

brew install soapbucket/tap/mcptest

curl installer (macOS, Linux, Apple Silicon and arm64 included):

curl -fsSL https://download.mcptest.sh/install.sh | sh

The installer detects your platform, downloads the signed release tarball from download.mcptest.sh, verifies its sha256 against the sums file, and drops mcptest into ~/.local/bin (or /usr/local/bin under sudo). Inspect it first with curl -fsSL https://download.mcptest.sh/install.sh | less.

Docker:

docker run --rm -v "$PWD":/work -w /work soapbucket/mcptest:latest run

Documentation

Full documentation lives under docs/. Start here:

Getting started: install to first passing test in about five minutes.
What is mcptest: the one-page definition.
Concepts: the mental model.
YAML reference: every field, every matcher.
CLI reference: every subcommand, every flag.
Examples: runnable suites across the whole surface, plus mcptest-examples for complete end-to-end suites against ten popular servers.

SDKs drive mcptest from your own test runner: Python (pytest), TypeScript (vitest, jest, mocha, node:test), Go, Rust (proc-macro), .NET (xUnit), and JVM (JUnit 5). See docs/sdks.md.

Build from source

cargo build --release
./target/release/mcptest --help
./scripts/check.sh        # the full gate: fmt + clippy + doc + build + test

License

Apache-2.0. See LICENSE and NOTICE.

Links

Documentation: docs/index.md
Releases: github.com/soapbucket/mcptest/releases
Issues and roadmap: github.com/soapbucket/mcptest/issues
X (Twitter): @soapbucket

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.config		.config
.github		.github
benchmarks/cold-agent		benchmarks/cold-agent
compliance		compliance
conformance-corpus		conformance-corpus
crates		crates
distribution		distribution
docs-site		docs-site
docs		docs
examples		examples
schemas		schemas
scripts		scripts
sdks		sdks
tests		tests
tools/sample-collector		tools/sample-collector
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
cliff.toml		cliff.toml
clippy.toml		clippy.toml
deny.toml		deny.toml
install.sh		install.sh
mcptest.yml		mcptest.yml
pricing.yaml		pricing.yaml
release.toml		release.toml
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mcptest

Try it in three commands

How it works

One binary, the whole surface

Test the agent loop, replay it offline

Use mcptest from your coding agent

Run it in CI

Why mcptest

Install

Documentation

Build from source

License

Links

About

Uh oh!

Releases 2

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mcptest

Try it in three commands

How it works

One binary, the whole surface

Test the agent loop, replay it offline

Use mcptest from your coding agent

Run it in CI

Why mcptest

Install

Documentation

Build from source

License

Links

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages