A minimal but serious, reproducible codebase for the Tencent Ads Algorithm Competition 2025 (all‑modality generative recommendation). This repo focuses on clarity, reproducibility, and easy onboarding without heavy refactors to the current code.
- Reproducible baseline with clear training/validation entry (
main.py) - Data pipeline and metrics utilities (
dataset.py,valid_hr_ndcg.py) - Model and optimizer modules (
model.py,muon.py) - Simple shell runner for platform integration (
run.sh) - Reproduces key components from Zhang et al. (2025): Ranking‑Driven Enhance and Gradient‑Guided Adaptive Weighter (see Paper Reference).
Key progress (from competition reproduction):
- HR@10: 0.036 → 0.109
- NDCG@10: 0.017 → 0.056
- Python: 3.10+
- PyTorch: 2.2+
- CUDA: 12+ (optional; falls back to CPU)
Install minimal dependencies:
pip install -r requirements.txt
Run training (local):
python main.py --help
python main.py \
--batch_size 128 \
--lr 1e-3 \
--maxlen 101 \
--num_epochs 6
Or via the provided runner (often used by platforms):
bash run.sh
Two modes are supported in code:
- Platform mode: set environment variables (the code detects these automatically)
TRAIN_DATA_PATH,TRAIN_LOG_PATH,TRAIN_TF_EVENTS_PATH,TRAIN_CKPT_PATH
- Local mode: a default local path is used in
main.pyinsidesetup_paths().- Update the
local_data_dirpath inmain.pyto your dataset location, or run in platform mode by setting the env vars above.
- Update the
Required dataset files (see dataset.py):
indexer.pkl,item_feat_dict.json,seq.jsonl,seq_offsets.pklcreative_emb/emb_*/...shards for multimodal embeddings
main.py— training entry with AMP, schedulers, logging (TensorBoard)infer.py— inference utilitiesdataset.py— data loading, multimodal embedding readers, batchingmodel.py— baseline modelmuon.py— Muon optimizer implementationvalid_hr_ndcg.py— evaluation (HR@K, NDCG@K)run.sh— platform integration script
- Lint/formatting: not enforced yet to avoid intrusive changes; feel free to use
ruff/blacklocally. - Tests: a smoke test ensures modules import; add targeted unit tests under
tests/as functionality stabilizes.
Run tests:
pytest -q
- Fix seeds: see
set_seed()inmain.py - Logs: TensorBoard events and JSON logs are written under the log paths returned by
setup_paths() - Checkpoints: see
TRAIN_CKPT_PATHor local path inmain.py
- This repo reproduces two techniques from the following work:
- Zhang, Luankang; Song, Kenan; Lee, Yi; Guo, Wei; Wang, Hao; Li, Yawen; Guo, Huifeng; Liu, Yong; Lian, Defu; Chen, Enhong (2025). "Killing Two Birds with One Stone: Unifying Retrieval and Ranking with a Single Generative Recommendation Model". DOI: 10.1145/3726302.3730017.
- Implementations: “Ranking‑Driven Enhance” and “Gradient‑Guided Adaptive Weighter”.
This project is licensed under the Apache License 2.0. See LICENSE.
If you use this repository, please cite: “OmniGenRec (Tencent Advertising Competition 2025)”.