TMMBench is a collection of vision-language model benchmarks for traditional Chinese and Taiwan-specific topics.
The benchmark is composed of 9 categories, including 290 questions:
-
STEM (25 questions)
- Taiwanese college entrance exam: Mathematics (6 questions)
- Taiwanese college entrance exam: Chemistry (6 questions)
- Taiwanese college entrance exam: Biology (6 questions)
- Taiwanese college entrance exam: Physics (6 questions)
-
Humanities and Social Sciences (20 questions)
- Taiwanese college entrance exam: Geography (7 questions)
- Taiwanese college entrance exam: History (6 questions)
- Taiwanese college entrance exam: Civics and Society (7 questions)
-
Tables (35 questions)
-
Infographics (35 questions)
-
Diagram (35 questions)
-
Daily Life in Taiwan (35 questions)
- Road signs (6 questions)
- Promotional advertisements (17 questions)
- News (12 questions)
-
Celebrity (35 questions)
-
Attractions and Landmarks (35 questions)
-
UI understanding (35 questions)
-
Install Dependencies
pip install -r requirements
-
Download Benchmark Data
Download the evaluation data from the
MediaTek-Research/TMMBench
huggingface repository and set up as follows:- Create an
eval_data
folder in the project root if it doesn't exist - Place
Question_multiplechoice.tsv
in theeval_data
folder
- Create an
-
Set Up OpenAI API Key
export OPENAI_API_KEY="your_openai_key"
-
Execute the Evaluation Script
The
run_eval.py
script handles both response generation and judgment via GPT-4o:CUDA_VISIBLE_DEVICES=0 python run_eval.py --model_name {model_path} --log_dir ./logs
-
Additional Options
You can customize the evaluation with these parameters:
python run_eval.py --model_name MODEL_PATH --backend hf --log_dir logs/YOUR_MODEL/
--model_name
: Path to your model (default: Llama-Breeze2-3B-Instruct)--backend
: Inference backend (options: hf, vllm, openai_api)--log_dir
: Directory to store results
@software{tmmbench,
author = {Chia-Sheng Liu and Yi-Chang Chen and Yu-Ting Hsu and Ru-Heng Huang and Meng-Hsi Chen and Da-Shan Shiu},
title = {TMMBench - Taiwan Multi-modal Model Benchmark},
month = April,
year = 2025,
url = {https://github.com/mtkresearch/TMMBench}
}