Skip to content

deng-wei/AIGC_Blind_Eval

Repository files navigation

🎨 AIGC Blind Evaluation System (AIGC Blind Eval)

[English] | [中文]


🎨 AIGC Blind Evaluation System (AIGC Blind Eval)

This is an AI generation effect comparison and evaluation system developed based on Gradio, designed to provide a fair and convenient blind testing process for algorithm engineers and business personnel. The system supports both image and video media formats, and achieves true blind testing by randomly shuffling the order of options.

🌟 Key Features

  • Double-Blind Evaluation: The system automatically shuffles the display order of A/B algorithms (Option 1 vs Option 2), ensuring objective evaluation.
  • Multimedia Support: Native support for comparing and evaluating images (PNG, JPG, WebP) and videos (MP4, MOV, MKV).
  • Version Management: Simple version management based on folder structure, supporting coexistence of multiple versions.
  • Real-time Dashboard: Provides a real-time visualization board for evaluation progress, win rates, number of participants, and other statistical data.
  • Persistent Storage: Uses SQLite database to store evaluation details, ensuring no data loss.
  • Multi-user Access: Supports simple username login.

📂 Folder Structure

AIGC_Blind_Eval/
├── core/                  # Core logic
│   ├── config.py          # Config loading logic
│   ├── db.py              # Database operations (SQLite)
│   └── evaluator.py       # Evaluation engine (Scanning, dispatching, results calculation)
├── data/                  # Root directory for image evaluation data
│   ├── input/             # Source files or Prompt reference images
│   │   └── v1/            # Version name (Folder name is the version number)
│   ├── algo_a/            # Algorithm A results
│   │   └── v1/
│   └── algo_b/            # Algorithm B results
│       └── v1/
├── data_video/            # Root directory for video evaluation data (Same structure as above)
├── main.py                # Entry point (Gradio app)
├── config.json            # Configuration file
├── eval_results.db        # Evaluation database (Auto-generated)
├── pyproject.toml         # Project dependency management
└── README.md              # Project documentation

🚀 Quick Start

1. Prepare Environment

The project recommends using uv for virtual environment and dependency management:

# Install dependencies
uv sync

2. Prepare Evaluation Data

The system automatically matches data by version name (folder) and file name.

  1. Create a version folder (e.g., test_v1) under data/input.
  2. Create version folders with the same name under data/algo_a and data/algo_b.
  3. Place images/videos in the corresponding directories. Note: The filenames of the same sample in A, B, and Input must be consistent (supporting scenarios where main filenames are the same but extensions differ).

3. Modify Configuration (Optional)

Edit config.json to customize data storage paths:

{
    "image": {
        "base_data_dir": "./data",
        "input_dir_name": "input",
        "algo_a_dir_name": "algo_a",
        "algo_b_dir_name": "algo_b"
    },
    "video": {
        "base_data_dir": "./data_video",
        "input_dir_name": "input",
        "algo_a_dir_name": "algo_a",
        "algo_b_dir_name": "algo_b"
    },
    "db_path": "eval_results.db"
}

4. Start Service

uv run python main.py

The default port is 7860. Access http://localhost:7860 after startup to begin evaluation.


🛡️ Daemon and Autostart (systemd)

To ensure the service runs continuously in the background, automatically restarts on process crashes, and starts on boot, it is recommended to use systemd.

1. Deployment Steps

  1. Modify Template: Modify WorkingDirectory, ExecStart, User, and Group in scripts/aigc-eval.service according to your actual situation.
  2. Install Service:
    # Copy service file
    sudo cp scripts/aigc-eval.service /etc/systemd/system/
    
    # Reload configuration
    sudo systemctl daemon-reload
    
    # Start and enable on boot
    sudo systemctl enable --now aigc-eval.service

2. Common Management Commands

  • Check Status: sudo systemctl status aigc-eval.service
  • Stop Service: sudo systemctl stop aigc-eval.service
  • Restart Service: sudo systemctl restart aigc-eval.service
  • Check Real-time Logs: journalctl -u aigc-eval.service -f

🛠️ Usage

Evaluation Workflow

  1. Login: Enter "Username" on the homepage.
  2. Select Version: In the "Image Evaluation" or "Video Evaluation" tab, select a version from the "Pending" table.
  3. Blind Comparison:
    • Input (usually the reference image or source video) is displayed in the middle.
    • Option 1 and Option 2 (system has randomly assigned A or B) are displayed on the left and right sides.
    • Click "Option 1 is Better", "About the Same", or "Option 2 is Better" to make a choice.
  4. Completion: After all samples in the current version are evaluated, it will automatically return to the dashboard.

Viewing Results

On the dashboard page table, you can see in real-time:

  • Competitor Win Rate: Win percentage of Algorithm A.
  • Self-developed Win Rate: Win percentage of Algorithm B.
  • Passed: Displays "Pass" if "Self-developed Win Rate" >= "Competitor Win Rate".
  • Your Progress: Records the number of samples completed by each user.

📝 Developer Notes

  • Database: If you need to clear all evaluation data, simply delete eval_results.db.
  • File Matching: evaluator.py has prefix matching capability. If the file extensions of A and B are inconsistent (e.g., .jpg vs .png), it will still load correctly as long as the main filenames are the same.
  • Extensibility: To add new metrics, modify the table structure in core/db.py.

About

A Gradio-based blind test evaluation system for AI-generated images and videos, featuring randomized shuffling for unbiased comparison.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages