GitHub - mtkresearch/TMMBench: TMMBench - Taiwan Multi-modal Model Benchmark

TMMBench - Taiwan Multi-modal Model Benchmark

TMMBench is a collection of vision-language model benchmarks for traditional Chinese and Taiwan-specific topics.

The benchmark is composed of 9 categories, including 290 questions:

STEM (25 questions)
- Taiwanese college entrance exam: Mathematics (6 questions)
- Taiwanese college entrance exam: Chemistry (6 questions)
- Taiwanese college entrance exam: Biology (6 questions)
- Taiwanese college entrance exam: Physics (6 questions)
Humanities and Social Sciences (20 questions)
- Taiwanese college entrance exam: Geography (7 questions)
- Taiwanese college entrance exam: History (6 questions)
- Taiwanese college entrance exam: Civics and Society (7 questions)
Tables (35 questions)
Infographics (35 questions)
Diagram (35 questions)
Daily Life in Taiwan (35 questions)
- Road signs (6 questions)
- Promotional advertisements (17 questions)
- News (12 questions)
Celebrity (35 questions)
Attractions and Landmarks (35 questions)
UI understanding (35 questions)

How to Use the Benchmark

Prerequisites

Install Dependencies
```
pip install -r requirements
```
Download Benchmark Data

Download the evaluation data from the MediaTek-Research/TMMBench huggingface repository and set up as follows:
- Create an eval_data folder in the project root if it doesn't exist
- Place Question_multiplechoice.tsv in the eval_data folder

Running the Benchmark

Set Up OpenAI API Key
```
export OPENAI_API_KEY="your_openai_key"
```
Execute the Evaluation Script

The run_eval.py script handles both response generation and judgment via GPT-4o:
```
CUDA_VISIBLE_DEVICES=0 python run_eval.py --model_name {model_path} --log_dir ./logs
```
Additional Options

You can customize the evaluation with these parameters:
```
python run_eval.py --model_name MODEL_PATH --backend hf --log_dir logs/YOUR_MODEL/
```
- --model_name: Path to your model (default: Llama-Breeze2-3B-Instruct)
- --backend: Inference backend (options: hf, vllm, openai_api)
- --log_dir: Directory to store results

Citation

@software{tmmbench,
  author = {Chia-Sheng Liu and Yi-Chang Chen and Yu-Ting Hsu and Ru-Heng Huang and Meng-Hsi Chen and Da-Shan Shiu},
  title = {TMMBench - Taiwan Multi-modal Model Benchmark},
  month = April,
  year = 2025,
  url = {https://github.com/mtkresearch/TMMBench}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
judgement		judgement
model		model
.gitignore		.gitignore
README.md		README.md
requirements		requirements
run_eval.py		run_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TMMBench - Taiwan Multi-modal Model Benchmark

How to Use the Benchmark

Prerequisites

Running the Benchmark

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

mtkresearch/TMMBench

Folders and files

Latest commit

History

Repository files navigation

TMMBench - Taiwan Multi-modal Model Benchmark

How to Use the Benchmark

Prerequisites

Running the Benchmark

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages