This repository provides installation and usage scripts for TimeOmni-1.
๐ Please let us know if you find out a mistake or have any suggestions!
๐ If you find this resource helpful, please consider to star this repository and cite our research:
๐ฉ News (Apr. 2026): We have released new post-trained versions based on Qwen3.5 on Hugging Face: TimeOmni-1-9B and TimeOmni-1-4B. These new versions further scale up model performance (inference code coming soon).
๐ฉ News (Feb. 2026): Please find the open source model on Hugging Face: TimeOmni-1-7B; see also our online demo: https://huggingface.co/spaces/anton-hugging/TimeOmni-1
๐ฉ News (Jan. 2026): TimeOmni-1 has been accepted to ICLR 2026! ๐
Table. Model Size Scaling Comparison
* Note: All metrics below are computed only on valid responses. โโโ indicates a success rate (SR) below 10%; in such cases, results are omitted due to insufficient statistical significance, and we therefore do not report them. For ACC, higher is better; for MAE, lower is better. Bold marks the best value in each ACC/MAE column.
| Task1 ID (ACCโ/SR) | Task1 OOD (ACCโ/SR) | Task2 ID (ACCโ/SR) | Task2 OOD (ACCโ/SR) | Task3 ID (MAEโ/SR) | Task3 OOD (MAEโ/SR) | Task4 ID (ACCโ/SR) | Task4 OOD (ACCโ/SR) | |
|---|---|---|---|---|---|---|---|---|
| 7B (Qwen2.5-Instruct) | ||||||||
| Qwen2.5-Instruct-7B | 48.5/100.0 | 42.8/100.0 | 21.6/99.8 | 26.3/100.0 | 23.28/53.1 | 146.12/55.5 | 25.5/100.0 | 24.9/100.0 |
| TimeOmni-1-7B | 90.7/97.5 | 87.7/98.3 | 69.3/99.8 | 64.0/99.8 | 14.30/93.8 | 145.53/82.3 | 47.9/100.0 | 58.9/100.0 |
| 4B (Qwen3.5) | ||||||||
| Qwen-3.5-4B | 0.0/16.5 | 5.9/17.0 | 28.3/12.4 | 35.4/12.0 | -/2.2 | -/9.0 | -/8.5 | -/9.2 |
| TimeOmni-1-4B | 91.5/99.5 | 91.2/98.4 | 71.1/100.0 | 66.1/99.9 | 13.68/97.6 | 170.41/86.1 | 58.5/100.0 | 72.0/100.0 |
| 9B (Qwen3.5) | ||||||||
| Qwen-3.5-9B | 91.2/51.0 | 93.5/46.1 | 43.3/12.1 | 36.3/12.8 | 17.56/14.1 | -/0.8 | 64.2/28.2 | 72.0/32.2 |
| TimeOmni-1-9B | 93.5/100.0 | 92.8/99.8 | 70.9/100.0 | 66.2/100.0 | 13.54/97.8 | 140.06/95.6 | 59.6/100.0 | 75.6/99.6 |
conda create -n timeomni python=3.10
conda activate timeomni
pip install -r requirements.txtpython install/download_hf_model.pyDefault model path: ~/.cache/huggingface/hub.
python install/download_testbed.pyThis creates:
data/timeomni1_id_test.jsondata/timeomni1_ood_test.json
Default system prompt:
Output Format:
<think>Your step-by-step reasoning process that justifies your answer</think>
<answer>Your final answer(Note: Only output a single uppercase letter of the correct option)</answer>
Run:
python inference/inference.py \
--model_dir "Local Model Path /models--anton-hugging--TimeOmni-1-7B/snapshots/<hash>" \
--question "Your Question" \
--system_prompt "Output Format:\n<think>Your step-by-step reasoning process that justifies your answer</think>\n<answer>Your final answer(Note: Only output a single uppercase letter of the correct option)</answer>"bash eval/run-timeomini_test.shOptional env overrides:
MODEL_DIR=anton-hugging/TimeOmni-1-7B \
ANS_ID_PATH=answer/timeomni1_test/your_id_outputs.json \
RES_ID_PATH=answer/timeomni1_test/your_id_results.json \
ANS_OOD_PATH=answer/timeomni1_test/your_ood_outputs.json \
RES_OOD_PATH=answer/timeomni1_test/your_ood_results.json \
bash eval/run-timeomini_test.shWe report Success Rate (SR), defined as the proportion of model outputs that yield a valid and extractable answer. All other metrics are computed on valid cases only.
- Tasks 1, 2, 4: model outputs a single uppercase letter (A/B/C/D). Metric: Accuracy (ACC).
- Task 3: model outputs a sequence (e.g.,
[2, 20, 21, ..., 83]). Metric: Mean Absolute Error (MAE).
@inproceedings{
guan2026timeomni,
title={TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models},
author={Tong Guan and Zijie Meng and Dianqi Li and Shiyu Wang and Chao-Han Huck Yang and Qingsong Wen and Zuozhu Liu and Sabato Marco Siniscalchi and Ming Jin and Shirui Pan},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=kOIclg7muL}
}