veriroot

Recover an image's signed origin after the web strips its metadata — and verify it offline.

Provenance metadata embedded in image files (C2PA / Content Credentials) is stripped by the open web within seconds — screenshots, recompression, resizing, and CDN transcodes all destroy it. veriroot stores a robust perceptual fingerprint of an image at registration time plus a tamper-evident, signed record of its origin. Later, even after every embedded byte of provenance is gone, the origin is recovered by resemblance (think Shazam, for images) and the recovered record is independently verifiable.

Attribution, not authenticity. A record proves who registered what, and when — never that the pixels are true. Output is always recovered / no_match / uncertain, never "real" / "fake". This is both the honest framing and the liability firewall.

This repository is a v1 proof-of-concept: fully working end to end, and benchmarked honestly so it quantifies exactly where the approach holds and where it breaks.

Why this exists

The proof of origin normally rides inside the file, so it falls off the moment the file travels. Detecting fakes is a losing arms race, and running a provenance network is a chicken-and-egg problem. veriroot takes a third path: let accountable parties prove they stood behind a specific image at a specific time, in a way that survives the web mangling it — and make that proof valuable the instant you register, with no network required.

The wedge: a certificate that verifies offline

Registering an image issues a portable proof-of-existence certificate. Anyone can verify it with the service switched off, against a published signed root:

$ veriroot-verify photo.png.cert.json --image photo.png
  [PASS] record_hash matches canonical record
  [PASS] registrant signature
  [PASS] merkle inclusion proof
  [PASS] signed tree head signature
  [PASS] image asset.sha256 (exact bytes)
  RESULT: VERIFIED

Alter the image, the record, or the proof and it flips to FAILED. This is the court-usable / takedown-usable proof of priority that makes registration valuable even with an empty registry.

How it works

register → SSCD + PDQ fingerprint → canonical OriginRecord (JCS) → Ed25519-sign
        → append to an RFC 6962 Merkle log → sign + publish the root → pgvector upsert
        → issue a CERTIFICATE
recover  → fingerprint the query → exact cosine + PDQ Hamming → fuse → 3-state result + inclusion proof
verify   → (offline) recompute hash + root, check signatures, match the published root

Two fingerprints are stored per image: SSCD (a 512-d self-supervised copy-detection embedding, robust to compression / resize / screenshots / moderate crops) and PDQ (a 256-bit DCT hash, cheap and near-perfect on format and compression). They cover each other's blind spots, and the benchmark reports both plus their fusion.

Quickstart

The local "spine" (API, Merkle log, certificate, offline verifier, tests) has a tiny footprint — no GPU, no torch required.

make install     # core + dev deps (no torch)
make db          # Postgres + pgvector via Docker Compose
make migrate     # create the schema
make serve       # FastAPI on :8000 — open http://localhost:8000 for the web UI

make register IMG=photo.png                       # writes photo.png.cert.json
make recover  IMG=transformed.png                 # three-state recovery result
make verify   CERT=photo.png.cert.json IMG=photo.png   # offline certificate check

make test        # 23 tests: Merkle (property-based), certificate, fingerprint, API
make demo        # Phase 0 acceptance: rank-1 over 100 distractors

Without the SSCD model installed, veriroot uses a deterministic DCT-descriptor fallback (labelled dct-fallback-v1 in every record and in RESULTS.md) so the whole pipeline runs offline. Install the real engine with:

make install-sscd   # adds torch / torchvision
make model          # download + SHA256-verify the pinned sscd_disc_mixup

Benchmark

make bench runs the full harness on a seeded synthetic corpus and never spends money. Because real image corpora and the SSCD model are large, the heavy benchmark is designed to run on Kaggle — the corpus stays in the cloud, embeddings happen there, and only the results come back:

make bench                          # synthetic corpus, offline → benchmark/out/ + RESULTS.md
python scripts/run_bench_kaggle.py  # run on Kaggle against a real corpus; pull results back

The corpus is pluggable: synthetic (default, zero-download), folder:PATH (a DISC2021 or CC0 subset), or kaggle:owner/slug. A PDQ-based dedup / leakage guard keeps the registered and distractor sets disjoint.

Results (real run)

From a real run with the sscd_disc_mixup model on 2,000 natural photos. Thresholds were selected on DEV_SUITE only, frozen, then applied unchanged to the held-out suites. Full numbers and figures live in RESULTS.md.

Suite	Transforms	Recovery (fused)	µAP
DEV	jpeg70 + resize½ + crop5%	0.86	0.93
HELDOUT_A	jpeg50 + resize¾ + crop10% + resave	0.44	0.53
HELDOUT_B	screenshot + jpeg40 + crop15%	0.61	0.76
HELDOUT_SCREENSHOT	resize + jpeg60 + gamma + pad	0.90	0.95

The two engines genuinely complement each other: on the screenshot suite SSCD alone collapses but PDQ carries it to 0.90, while on the crop suites PDQ collapses and SSCD holds. False-match rates stay at 0.001–0.006, and the DEV → held-out generalization gap (0.21 fused) is reported, not hidden.

Honest limitations

veriroot measures its failure modes rather than pretending to beat them:

Heavy cropping degrades resemblance — the hardest case, quantified per transform.
Generative regeneration (diffusion img2img) can repaint a look-alike with a new fingerprint — an open research frontier; the probe is flag-gated and spend-capped.
Deliberate collisions are reported with the exact attacker budget assumed (the black-box probe found 0% success under its stated budget).
Identity is out of scope for v1: records bind to a key, not a verified real-world identity. That key→identity layer is the genuinely hard trust problem and is deferred to v2.

See docs/THREAT_MODEL.md for the full picture.

Repository layout

src/veriroot/          the service spine (fingerprint, log, store, services, api)
src/veriroot_verify/   standalone offline certificate verifier (no service dependency)
benchmark/             corpus, transforms, metrics, threshold selection, adversarial probes
tests/                 property-based + integration tests
scripts/               CLI, Phase-0 demo, model fetcher, Kaggle orchestration
docs/                  threat model

Documentation

docs/THREAT_MODEL.md — what v1 proves, what it does not, and the measured failure modes.
RESULTS.md — the latest benchmark dashboard (regenerated by make bench).

Status and roadmap

v1 is complete: end-to-end register / recover / verify, an offline certificate verifier, a tamper-evident append-only log with consistency proofs, an honest benchmark with held-out suites, and measured adversarial failure modes. Next up: batched embedding for throughput, a full-accuracy run on a larger real corpus, the key→identity binding layer, and a hosted demo.

License

Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
benchmark		benchmark
docs		docs
kaggle		kaggle
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RESULTS.md		RESULTS.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

veriroot

Why this exists

The wedge: a certificate that verifies offline

How it works

Quickstart

Benchmark

Results (real run)

Honest limitations

Repository layout

Documentation

Status and roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

veriroot

Why this exists

The wedge: a certificate that verifies offline

How it works

Quickstart

Benchmark

Results (real run)

Honest limitations

Repository layout

Documentation

Status and roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages