[English] | [中文]
This is an AI generation effect comparison and evaluation system developed based on Gradio, designed to provide a fair and convenient blind testing process for algorithm engineers and business personnel. The system supports both image and video media formats, and achieves true blind testing by randomly shuffling the order of options.
- Double-Blind Evaluation: The system automatically shuffles the display order of A/B algorithms (Option 1 vs Option 2), ensuring objective evaluation.
- Multimedia Support: Native support for comparing and evaluating images (PNG, JPG, WebP) and videos (MP4, MOV, MKV).
- Version Management: Simple version management based on folder structure, supporting coexistence of multiple versions.
- Real-time Dashboard: Provides a real-time visualization board for evaluation progress, win rates, number of participants, and other statistical data.
- Persistent Storage: Uses SQLite database to store evaluation details, ensuring no data loss.
- Multi-user Access: Supports simple username login.
AIGC_Blind_Eval/
├── core/ # Core logic
│ ├── config.py # Config loading logic
│ ├── db.py # Database operations (SQLite)
│ └── evaluator.py # Evaluation engine (Scanning, dispatching, results calculation)
├── data/ # Root directory for image evaluation data
│ ├── input/ # Source files or Prompt reference images
│ │ └── v1/ # Version name (Folder name is the version number)
│ ├── algo_a/ # Algorithm A results
│ │ └── v1/
│ └── algo_b/ # Algorithm B results
│ └── v1/
├── data_video/ # Root directory for video evaluation data (Same structure as above)
├── main.py # Entry point (Gradio app)
├── config.json # Configuration file
├── eval_results.db # Evaluation database (Auto-generated)
├── pyproject.toml # Project dependency management
└── README.md # Project documentation
The project recommends using uv for virtual environment and dependency management:
# Install dependencies
uv syncThe system automatically matches data by version name (folder) and file name.
- Create a version folder (e.g.,
test_v1) underdata/input. - Create version folders with the same name under
data/algo_aanddata/algo_b. - Place images/videos in the corresponding directories. Note: The filenames of the same sample in A, B, and Input must be consistent (supporting scenarios where main filenames are the same but extensions differ).
Edit config.json to customize data storage paths:
{
"image": {
"base_data_dir": "./data",
"input_dir_name": "input",
"algo_a_dir_name": "algo_a",
"algo_b_dir_name": "algo_b"
},
"video": {
"base_data_dir": "./data_video",
"input_dir_name": "input",
"algo_a_dir_name": "algo_a",
"algo_b_dir_name": "algo_b"
},
"db_path": "eval_results.db"
}uv run python main.pyThe default port is 7860. Access http://localhost:7860 after startup to begin evaluation.
To ensure the service runs continuously in the background, automatically restarts on process crashes, and starts on boot, it is recommended to use systemd.
- Modify Template: Modify
WorkingDirectory,ExecStart,User, andGroupinscripts/aigc-eval.serviceaccording to your actual situation. - Install Service:
# Copy service file sudo cp scripts/aigc-eval.service /etc/systemd/system/ # Reload configuration sudo systemctl daemon-reload # Start and enable on boot sudo systemctl enable --now aigc-eval.service
- Check Status:
sudo systemctl status aigc-eval.service - Stop Service:
sudo systemctl stop aigc-eval.service - Restart Service:
sudo systemctl restart aigc-eval.service - Check Real-time Logs:
journalctl -u aigc-eval.service -f
- Login: Enter "Username" on the homepage.
- Select Version: In the "Image Evaluation" or "Video Evaluation" tab, select a version from the "Pending" table.
- Blind Comparison:
- Input (usually the reference image or source video) is displayed in the middle.
- Option 1 and Option 2 (system has randomly assigned A or B) are displayed on the left and right sides.
- Click "Option 1 is Better", "About the Same", or "Option 2 is Better" to make a choice.
- Completion: After all samples in the current version are evaluated, it will automatically return to the dashboard.
On the dashboard page table, you can see in real-time:
- Competitor Win Rate: Win percentage of Algorithm A.
- Self-developed Win Rate: Win percentage of Algorithm B.
- Passed: Displays "Pass" if "Self-developed Win Rate" >= "Competitor Win Rate".
- Your Progress: Records the number of samples completed by each user.
- Database: If you need to clear all evaluation data, simply delete
eval_results.db. - File Matching:
evaluator.pyhas prefix matching capability. If the file extensions of A and B are inconsistent (e.g.,.jpgvs.png), it will still load correctly as long as the main filenames are the same. - Extensibility: To add new metrics, modify the table structure in
core/db.py.