Skip to content

Implementation of the paper "TRUSTWORTHY AND PRIVACY-PRESERVING PERCEPTUAL HASHING WITH ZERO-KNOWLEDGE PROOFS"

Notifications You must be signed in to change notification settings

mengdehong/zkph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TRUSTWORTHY AND PRIVACY-PRESERVING PERCEPTUAL HASHING WITH ZERO-KNOWLEDGE PROOFS

This repository provides a reference implementation of the paper “Trustworthy and Privacy-Preserving Perceptual Hashing with Zero-Knowledge Proofs”, including:

  • Perceptual hashing deep models (training and evaluation, PyTorch)
  • A zero-knowledge proof prototype for Hamming distance based on multilinear sum-check and polynomial commitments (Rust, arkworks)

We ship a Docker image for a reproducible environment and provide scripts for experiments and benchmarking.

Table of Contents

  • Overview and Contributions
  • Repository Structure
  • Environment and Dependencies
    • Docker (recommended)
    • Local Installation (optional)
  • Data Preparation
  • Training and Evaluation
    • Training (TrainAll)
    • PIHD metrics (AUC, FPR@95TPR)
    • COCO Zero-shot Generalization
    • Ablation Study
  • ZK Proof Benchmark (HDProof)
  • Reproducibility Checklist and Expected Artifacts
  • FAQ
  • Acknowledgements
  • Citation

Overview and Contributions

This implementation targets trustworthy and privacy-preserving perceptual hashing with a unified training and verification pipeline:

  • Multiple backbones (ResNet50, ViT, ConvNeXtV2, Swin-T, MambaOut, GroupMamba) with Baseline/Enhanced strategies
  • Unified training and evaluation producing ROC-AUC and FPR@95TPR
  • Zero-shot generalization evaluation on COCO
  • A Hamming-distance ZK proof (hamproof) prototype and benchmark

Repository Structure

PerceptHash/                 # Perceptual hashing models and scripts (PyTorch)
  model/                     # Models and backbones, includes kernels/selective_scan extension
  eval/                      # Evaluation scripts (AUC, COCO, Ablation, TPR@95% threshold)
  script/                    # Training entry (TrainAll.py)
  datasets/, checkpoint/, save/  # Data, pretrained weights, outputs (mounted as needed)

HDProof/                     # Hamming distance ZK proof (Rust/arkworks)
  3rd/                       # poly-commit, ml_sumcheck submodules
  src/                       # Protocol implementation
  src/bin/quick_bench.rs     # Benchmark entry (see CLI options below)

Dockerfile                   # Reproducible env (CUDA 12.1 + Python 3.10 + PyTorch 2.5.1 + Rust)

Environment and Dependencies

We recommend Docker for consistent CUDA/driver and Python/Rust dependencies.

Docker (recommended)

Build the image from the repo root:

git clone https://github.com/mengdehong/zkph.git
cd zkph
docker build -t zkph:latest .

Key environment in the image:

  • Base: nvidia/cuda:12.1.1-devel-ubuntu20.04
  • Python: 3.10 (Conda env: ph)
  • PyTorch: 2.5.1 + CUDA 12.1 (torchvision 0.20.1, torchaudio 2.5.1)
  • Other Python deps: see PerceptHash/requirements.txt
  • Rust toolchain (stable) + arkworks
  • HDProof is built during image build (cargo build --release).

Run-time recommendations:

  • Use --gpus all for GPU
  • Add --shm-size=32g to avoid DataLoader shared memory issues
  • Mount host data into /app/PerceptHash/... inside the container via -v.

Local Installation (optional)

If you do not use Docker, ensure compatible versions:

  • CUDA 12.1 with PyTorch 2.5.1; Python 3.10
  • Install deps from PerceptHash/requirements.txt (contains extra index for CUDA wheels)
  • Rust toolchain (stable); build HDProof with cargo build --release

Note: CUDA/driver mismatches across hosts can break wheels; Docker is preferred for reproducibility.

Data Preparation

Prepare a host data directory, e.g., ~/percepthash_data (or reuse the original ~/code/zkph/PerceptHash/... layout) and place:

  • PIHD dataset (train & eval): expected to contain PerceptHash/datasets/PIHD/test/test_class
  • CocoVal2017 (eval/generalization): with origin/, transformed/, and pairs_csv/
  • Pretrained weights (for training): put under PerceptHash/checkpoint/
  • Trained models/logs (for evaluation): put under PerceptHash/save/

Mount these directories to /app/PerceptHash/... paths inside the container (see commands below).

Dataset Links (from the original README)

To avoid link rot, both primary and backup links are listed:

  1. PIHD dataset (train & eval)

  2. CocoVal2017 generated dataset (eval)

  3. Pretrained model weights (training)

  4. Trained outputs for evaluation

Training and Evaluation

All commands are executed on the host; the container provides the runtime. Replace host paths as needed (e.g., /home/USER/code/zkPH or ~/percepthash_data).

Training (TrainAll)

Runs all configured backbones/strategies and saves the best weights under save/TrainAll/<Model>/... along with summaries:

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/PIHD:/app/PerceptHash/datasets/PIHD \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python script/TrainAll.py"

Global hyper-parameters and experiments are defined in PerceptHash/script/TrainAll.py. Resource note: default batch_size=64; if memory is tight, reduce it accordingly.

Evaluation: PIHD (AUC, FPR@95TPR)

PerceptHash/eval/EvalAuc.py supports three modes:

  • 64-bit models: --mode 64
  • 32-bit models: --mode 32
  • TPR@95% Hamming threshold table for enhanced models: --mode thr

Optional args: --batch-size, --num-workers, --device (e.g., cuda:0/cpu), --no-amp.

Example:

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/PIHD:/app/PerceptHash/datasets/PIHD \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python -m eval.EvalAuc --mode 64 --batch-size 32 --num-workers 4"

Outputs CSV files under save/eval/ with AUC and FPR@95TPR, and prints a summary.

Evaluation: Params & Latency

Use PerceptHash/eval/EvalParamLatency.py to measure parameters and inference latency:

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/PIHD:/app/PerceptHash/datasets/PIHD \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python -m eval.EvalParamLatency"

Evaluation: COCO Zero-shot Generalization

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/CocoVal2017:/app/PerceptHash/datasets/CocoVal2017 \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python -m eval.EvalCoco"

Evaluation: Ablation Study

docker run --gpus all -it --rm \
  --shm-size="32g" \
  -v ~/percepthash_data/PIHD:/app/PerceptHash/datasets/PIHD \
  -v ~/percepthash_data/save:/app/PerceptHash/save \
  -v ~/percepthash_data/checkpoint:/app/PerceptHash/checkpoint \
  zkph:latest \
  bash -c "source /opt/conda/bin/activate ph && cd PerceptHash && python -m eval.EvalAblations"

ZK Proof Benchmark (HDProof)

HDProof provides an arkworks-based ZK protocol for Hamming distance with a configurable benchmark quick_bench (built in Docker).

Example:

docker run -it --rm zkph:latest \
  HDProof/target/release/quick_bench \
  --sizes=14,16,18,20 --warmup=10 --samples=20 --seed=42 --tmpfs

Key options (from src/bin/quick_bench.rs):

  • --sizes=14,16,...: number of hashes per run as 2^p; default 14,16,18
  • --warmup=N: warm-up runs (default 1)
  • --samples=N: number of samples; if omitted, --repeats (default 3) is used
  • --seed=S: RNG seed for reproducibility
  • --tmpfs: write intermediate files under /dev/shm if available to reduce I/O jitter

Hyperparameter Optimization

To ensure optimal performance, all core hyperparameters (including learning rate, weight decay, and the loss function weights $\alpha$ and $\beta$) were automatically searched and tuned using the Optuna framework via the TuneAll.py script. The final optimized parameter sets were then recorded in the TrainAll.py script. Regarding the loss function, baseline models consistently use TripletLoss with a fixed margin of 0.5. In contrast, the enhanced (+) models underwent a hyperparameter search for the angular_margin to achieve the best performance. All models were trained for 100 epochs with a batch size of 64, using the AdamW optimizer, a CosineAnnealingLR scheduler, and a hash bit length of 32 or 64.

Model-Specific Hyperparameters:

Model lr weight_decay alpha beta Margin
ResNet50 0.000613 0.0263 0.00423 3.15e-6 0.5
ResNet50+ 0.000690 0.000426 0.00247 2.84e-6 0.281
ViT 0.000193 0.00471 0.00456 8.36e-5 0.5
ViT+ 7.39e-5 0.000702 0.0128 2.65e-6 0.479
MambaOut 0.000496 0.00131 0.0379 2.30e-6 0.5
MambaOut+ 0.000322 0.000733 0.00371 5.74e-5 0.468
GroupMamba 0.000272 0.0460 0.00257 3.48e-6 0.5
GroupMamba+ 0.000115 0.000282 0.0104 1.61e-5 0.315
ConvNeXtV2 0.000150 0.00165 0.00260 8.71e-6 0.5
ConvNeXtV2+ 0.000328 0.0878 0.00101 1.12e-6 0.240
SwinTiny 8.61e-5 0.00470 0.00402 2.18e-5 0.5
SwinTiny+ 0.000202 0.00126 0.0161 1.29e-5 0.388

Note: The value in the Margin column corresponds to different loss functions. For baseline models, it represents the margin parameter for TripletLoss. For the enhanced models (indicated by a +), it represents the angular_margin parameter for AngularLoss.

Reproducibility Checklist and Expected Artifacts

  1. Environment: Docker image above (CUDA 12.1 / PyTorch 2.5.1 / Python 3.10 / Rust stable)
  2. Data:
    • PIHD: ensure PerceptHash/datasets/PIHD/test/test_class exists
    • COCOVal2017: ensure origin/, transformed/, pairs_csv/
    • checkpoint/ and save/ as needed
  3. Training: run TrainAll -> save/TrainAll/<Model>/..._best.pth, aggregated_results.json
  4. PIHD eval: EvalAuc --mode 64/32/thr
  5. COCO eval: EvalCoco
  6. Ablation: EvalAblations
  7. ZK benchmark: quick_bench -> collect statistics (mean/median/stddev/p90/min/max)

FAQ

  • CUDA/driver mismatch: Prefer Docker; for local installs ensure CUDA and PyTorch versions match
  • DataLoader shared memory: add --shm-size=32g or reduce --num-workers
  • Out-of-memory: reduce --batch-size or use CPU (--device cpu) at the cost of speed
  • Missing weights for eval: ensure save/TrainAll/..._best.pth is mounted inside the container
  • COCO eval requires three directories: origin/, transformed/, pairs_csv/
  • I/O jitter in quick_bench: add --tmpfs

Acknowledgements

We gratefully acknowledge the following open-source projects and repositories that inspired or supported parts of this work:

Notes:

  • We do not provide environments or integrations for Apple NeuralHash or imagededup. Experiments involving those ecosystems are out-of-scope of this repository’s environment setup.

License note: Components under HDProof (hamproof crate) rely on arkworks crates and follow their upstream licenses. If no unified license is specified at the repo root, please use this code for academic research purposes.

About

Implementation of the paper "TRUSTWORTHY AND PRIVACY-PRESERVING PERCEPTUAL HASHING WITH ZERO-KNOWLEDGE PROOFS"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published