TRUSTWORTHY AND PRIVACY-PRESERVING PERCEPTUAL HASHING WITH ZERO-KNOWLEDGE PROOFS

This repository provides a reference implementation of the paper “Trustworthy and Privacy-Preserving Perceptual Hashing with Zero-Knowledge Proofs”, including:

Perceptual hashing deep models (training and evaluation, PyTorch)
A zero-knowledge proof prototype for Hamming distance based on multilinear sum-check and polynomial commitments (Rust, arkworks)

We ship a Docker image for a reproducible environment and provide scripts for experiments and benchmarking.

Overview and Contributions
Repository Structure
Environment and Dependencies
- Docker (recommended)
- Local Installation (optional)
Data Preparation
Training and Evaluation
- Training (TrainAll)
- PIHD metrics (AUC, FPR@95TPR)
- COCO Zero-shot Generalization
- Ablation Study
ZK Proof Benchmark (HDProof)
Reproducibility Checklist and Expected Artifacts
FAQ
Acknowledgements
Citation

Overview and Contributions

This implementation targets trustworthy and privacy-preserving perceptual hashing with a unified training and verification pipeline:

Multiple backbones (ResNet50, ViT, ConvNeXtV2, Swin-T, MambaOut, GroupMamba) with Baseline/Enhanced strategies
Unified training and evaluation producing ROC-AUC and FPR@95TPR
Zero-shot generalization evaluation on COCO
A Hamming-distance ZK proof (hamproof) prototype and benchmark

Repository Structure

PerceptHash/                 # Perceptual hashing models and scripts (PyTorch)
  model/                     # Models and backbones, includes kernels/selective_scan extension
  eval/                      # Evaluation scripts (AUC, COCO, Ablation, TPR@95% threshold)
  script/                    # Training entry (TrainAll.py)
  datasets/, checkpoint/, save/  # Data, pretrained weights, outputs (mounted as needed)

HDProof/                     # Hamming distance ZK proof (Rust/arkworks)
  3rd/                       # poly-commit, ml_sumcheck submodules
  src/                       # Protocol implementation
  src/bin/quick_bench.rs     # Benchmark entry (see CLI options below)

Dockerfile                   # Reproducible env (CUDA 12.1 + Python 3.10 + PyTorch 2.5.1 + Rust)

Environment and Dependencies

We recommend Docker for consistent CUDA/driver and Python/Rust dependencies.

Docker (recommended)

Build the image from the repo root:

git clone https://github.com/mengdehong/zkph.git
cd zkph
docker build -t zkph:latest .

Key environment in the image:

Base: nvidia/cuda:12.1.1-devel-ubuntu20.04
Python: 3.10 (Conda env: ph)
PyTorch: 2.5.1 + CUDA 12.1 (torchvision 0.20.1, torchaudio 2.5.1)
Other Python deps: see PerceptHash/requirements.txt
Rust toolchain (stable) + arkworks
HDProof is built during image build (cargo build --release).

Run-time recommendations:

Use --gpus all for GPU
Add --shm-size=32g to avoid DataLoader shared memory issues
Mount host data into /app/PerceptHash/... inside the container via -v.

Local Installation (optional)

If you do not use Docker, ensure compatible versions:

CUDA 12.1 with PyTorch 2.5.1; Python 3.10
Install deps from PerceptHash/requirements.txt (contains extra index for CUDA wheels)
Rust toolchain (stable); build HDProof with cargo build --release

Note: CUDA/driver mismatches across hosts can break wheels; Docker is preferred for reproducibility.

Data Preparation

Prepare a host data directory, e.g., ~/percepthash_data (or reuse the original ~/code/zkph/PerceptHash/... layout) and place:

PIHD dataset (train & eval): expected to contain PerceptHash/datasets/PIHD/test/test_class
CocoVal2017 (eval/generalization): with origin/, transformed/, and pairs_csv/
Pretrained weights (for training): put under PerceptHash/checkpoint/
Trained models/logs (for evaluation): put under PerceptHash/save/

Mount these directories to /app/PerceptHash/... paths inside the container (see commands below).

Dataset Links (from the original README)

To avoid link rot, both primary and backup links are listed:

PIHD dataset (train & eval)
- Baidu Netdisk: https://pan.baidu.com/share/init?surl=uVnUVr5HqaSpoNifGElucw&pwd=8xwr
- Google Drive (backup): https://drive.google.com/file/d/1RaQggCU32_ojtACR8S2f6nfcwTnZt-UB/view?usp=drive_link
CocoVal2017 generated dataset (eval)
- Google Drive: https://drive.google.com/file/d/10W5tuI6ZC-l4I_NkQgjmAHF5Gj0jx9Qw/view?usp=drive_link
Pretrained model weights (training)
- Google Drive: https://drive.google.com/file/d/186XZ0lxF-rkDGFfLtaVrOJdpzog2hB_7/view?usp=drive_link
Trained outputs for evaluation
- Google Drive: https://drive.google.com/open?id=1ctMUqVTuCHiN89wpN1TRyjNRSJOjyAfd&usp=drive_fs

Training and Evaluation

All commands are executed on the host; the container provides the runtime. Replace host paths as needed (e.g., /home/USER/code/zkPH or ~/percepthash_data).

Training (TrainAll)

Runs all configured backbones/strategies and saves the best weights under save/TrainAll/<Model>/... along with summaries:

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/PIHD:/app/PerceptHash/datasets/PIHD \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python script/TrainAll.py"

Global hyper-parameters and experiments are defined in PerceptHash/script/TrainAll.py. Resource note: default batch_size=64; if memory is tight, reduce it accordingly.

Evaluation: PIHD (AUC, FPR@95TPR)

PerceptHash/eval/EvalAuc.py supports three modes:

64-bit models: --mode 64
32-bit models: --mode 32
TPR@95% Hamming threshold table for enhanced models: --mode thr

Optional args: --batch-size, --num-workers, --device (e.g., cuda:0/cpu), --no-amp.

Example:

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/PIHD:/app/PerceptHash/datasets/PIHD \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python -m eval.EvalAuc --mode 64 --batch-size 32 --num-workers 4"

Outputs CSV files under save/eval/ with AUC and FPR@95TPR, and prints a summary.

Evaluation: Params & Latency

Use PerceptHash/eval/EvalParamLatency.py to measure parameters and inference latency:

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/PIHD:/app/PerceptHash/datasets/PIHD \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python -m eval.EvalParamLatency"

Evaluation: COCO Zero-shot Generalization

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/CocoVal2017:/app/PerceptHash/datasets/CocoVal2017 \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python -m eval.EvalCoco"

Evaluation: Ablation Study

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/PIHD:/app/PerceptHash/datasets/PIHD \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python -m eval.EvalAblations"

ZK Proof Benchmark (HDProof)

HDProof provides an arkworks-based ZK protocol for Hamming distance with a configurable benchmark quick_bench (built in Docker).

Example:

docker run -it --rm zkph:latest \
  HDProof/target/release/quick_bench \
  --sizes=14,16,18,20 --warmup=10 --samples=20 --seed=42 --tmpfs

Key options (from src/bin/quick_bench.rs):

--sizes=14,16,...: number of hashes per run as 2^p; default 14,16,18
--warmup=N: warm-up runs (default 1)
--samples=N: number of samples; if omitted, --repeats (default 3) is used
--seed=S: RNG seed for reproducibility
--tmpfs: write intermediate files under /dev/shm if available to reduce I/O jitter

Hyperparameter Optimization

To ensure optimal performance, all core hyperparameters (including learning rate, weight decay, and the loss function weights $\alpha$ and $\beta$) were automatically searched and tuned using the Optuna framework via the TuneAll.py script. The final optimized parameter sets were then recorded in the TrainAll.py script. Regarding the loss function, baseline models consistently use TripletLoss with a fixed margin of 0.5. In contrast, the enhanced (+) models underwent a hyperparameter search for the angular_margin to achieve the best performance. All models were trained for 100 epochs with a batch size of 64, using the AdamW optimizer, a CosineAnnealingLR scheduler, and a hash bit length of 32 or 64.

Model-Specific Hyperparameters:

Model	`lr`	`weight_decay`	`alpha`	`beta`	Margin
ResNet50	0.000613	0.0263	0.00423	3.15e-6	0.5
ResNet50+	0.000690	0.000426	0.00247	2.84e-6	0.281
ViT	0.000193	0.00471	0.00456	8.36e-5	0.5
ViT+	7.39e-5	0.000702	0.0128	2.65e-6	0.479
MambaOut	0.000496	0.00131	0.0379	2.30e-6	0.5
MambaOut+	0.000322	0.000733	0.00371	5.74e-5	0.468
GroupMamba	0.000272	0.0460	0.00257	3.48e-6	0.5
GroupMamba+	0.000115	0.000282	0.0104	1.61e-5	0.315
ConvNeXtV2	0.000150	0.00165	0.00260	8.71e-6	0.5
ConvNeXtV2+	0.000328	0.0878	0.00101	1.12e-6	0.240
SwinTiny	8.61e-5	0.00470	0.00402	2.18e-5	0.5
SwinTiny+	0.000202	0.00126	0.0161	1.29e-5	0.388

Note: The value in the Margin column corresponds to different loss functions. For baseline models, it represents the margin parameter for TripletLoss. For the enhanced models (indicated by a +), it represents the angular_margin parameter for AngularLoss.

Reproducibility Checklist and Expected Artifacts

Environment: Docker image above (CUDA 12.1 / PyTorch 2.5.1 / Python 3.10 / Rust stable)
Data:
- PIHD: ensure PerceptHash/datasets/PIHD/test/test_class exists
- COCOVal2017: ensure origin/, transformed/, pairs_csv/
- checkpoint/ and save/ as needed
Training: run TrainAll -> save/TrainAll/<Model>/..._best.pth, aggregated_results.json
PIHD eval: EvalAuc --mode 64/32/thr
COCO eval: EvalCoco
Ablation: EvalAblations
ZK benchmark: quick_bench -> collect statistics (mean/median/stddev/p90/min/max)

FAQ

CUDA/driver mismatch: Prefer Docker; for local installs ensure CUDA and PyTorch versions match
DataLoader shared memory: add --shm-size=32g or reduce --num-workers
Out-of-memory: reduce --batch-size or use CPU (--device cpu) at the cost of speed
Missing weights for eval: ensure save/TrainAll/..._best.pth is mounted inside the container
COCO eval requires three directories: origin/, transformed/, pairs_csv/
I/O jitter in quick_bench: add --tmpfs

Acknowledgements

We gratefully acknowledge the following open-source projects and repositories that inspired or supported parts of this work:

arkworks: https://github.com/arkworks-rs
GroupMamba: https://github.com/Amshaker/GroupMamba
MambaHash: https://github.com/shuaichaochao/MambaHash
DinoHash (perceptual hash): https://github.com/proteus-photos/dinohash-perceptual-hash.git

Notes:

We do not provide environments or integrations for Apple NeuralHash or imagededup. Experiments involving those ecosystems are out-of-scope of this repository’s environment setup.

License note: Components under HDProof (hamproof crate) rely on arkworks crates and follow their upstream licenses. If no unified license is specified at the repo root, please use this code for academic research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
HDProof		HDProof
PerceptHash		PerceptHash
.dockerignore		.dockerignore
Dockerfile		Dockerfile
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRUSTWORTHY AND PRIVACY-PRESERVING PERCEPTUAL HASHING WITH ZERO-KNOWLEDGE PROOFS

Table of Contents

Overview and Contributions

Repository Structure

Environment and Dependencies

Docker (recommended)

Local Installation (optional)

Data Preparation

Dataset Links (from the original README)

Training and Evaluation

Training (TrainAll)

Evaluation: PIHD (AUC, FPR@95TPR)

Evaluation: Params & Latency

Evaluation: COCO Zero-shot Generalization

Evaluation: Ablation Study

ZK Proof Benchmark (HDProof)

Hyperparameter Optimization

Reproducibility Checklist and Expected Artifacts

FAQ

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

mengdehong/zkph

Folders and files

Latest commit

History

Repository files navigation

TRUSTWORTHY AND PRIVACY-PRESERVING PERCEPTUAL HASHING WITH ZERO-KNOWLEDGE PROOFS

Table of Contents

Overview and Contributions

Repository Structure

Environment and Dependencies

Docker (recommended)

Local Installation (optional)

Data Preparation

Dataset Links (from the original README)

Training and Evaluation

Training (TrainAll)

Evaluation: PIHD (AUC, FPR@95TPR)

Evaluation: Params & Latency

Evaluation: COCO Zero-shot Generalization

Evaluation: Ablation Study

ZK Proof Benchmark (HDProof)

Hyperparameter Optimization

Reproducibility Checklist and Expected Artifacts

FAQ

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages