Skip to content

mx-Liu123/OmniGenRec-TAAC2025

OmniGenRec-TAAC2025

A minimal but serious, reproducible codebase for the Tencent Ads Algorithm Competition 2025 (all‑modality generative recommendation). This repo focuses on clarity, reproducibility, and easy onboarding without heavy refactors to the current code.

Highlights

  • Reproducible baseline with clear training/validation entry (main.py)
  • Data pipeline and metrics utilities (dataset.py, valid_hr_ndcg.py)
  • Model and optimizer modules (model.py, muon.py)
  • Simple shell runner for platform integration (run.sh)
  • Reproduces key components from Zhang et al. (2025): Ranking‑Driven Enhance and Gradient‑Guided Adaptive Weighter (see Paper Reference).

Key progress (from competition reproduction):

  • HR@10: 0.036 → 0.109
  • NDCG@10: 0.017 → 0.056

Quickstart

  • Python: 3.10+
  • PyTorch: 2.2+
  • CUDA: 12+ (optional; falls back to CPU)

Install minimal dependencies:

pip install -r requirements.txt

Run training (local):

python main.py --help
python main.py \
  --batch_size 128 \
  --lr 1e-3 \
  --maxlen 101 \
  --num_epochs 6

Or via the provided runner (often used by platforms):

bash run.sh

Data Paths

Two modes are supported in code:

  • Platform mode: set environment variables (the code detects these automatically)
    • TRAIN_DATA_PATH, TRAIN_LOG_PATH, TRAIN_TF_EVENTS_PATH, TRAIN_CKPT_PATH
  • Local mode: a default local path is used in main.py inside setup_paths().
    • Update the local_data_dir path in main.py to your dataset location, or run in platform mode by setting the env vars above.

Required dataset files (see dataset.py):

  • indexer.pkl, item_feat_dict.json, seq.jsonl, seq_offsets.pkl
  • creative_emb/emb_*/... shards for multimodal embeddings

Repository Structure

  • main.py — training entry with AMP, schedulers, logging (TensorBoard)
  • infer.py — inference utilities
  • dataset.py — data loading, multimodal embedding readers, batching
  • model.py — baseline model
  • muon.py — Muon optimizer implementation
  • valid_hr_ndcg.py — evaluation (HR@K, NDCG@K)
  • run.sh — platform integration script

Development

  • Lint/formatting: not enforced yet to avoid intrusive changes; feel free to use ruff/black locally.
  • Tests: a smoke test ensures modules import; add targeted unit tests under tests/ as functionality stabilizes.

Run tests:

pytest -q

Results & Reproducibility

  • Fix seeds: see set_seed() in main.py
  • Logs: TensorBoard events and JSON logs are written under the log paths returned by setup_paths()
  • Checkpoints: see TRAIN_CKPT_PATH or local path in main.py

Paper Reference

  • This repo reproduces two techniques from the following work:
    • Zhang, Luankang; Song, Kenan; Lee, Yi; Guo, Wei; Wang, Hao; Li, Yawen; Guo, Huifeng; Liu, Yong; Lian, Defu; Chen, Enhong (2025). "Killing Two Birds with One Stone: Unifying Retrieval and Ranking with a Single Generative Recommendation Model". DOI: 10.1145/3726302.3730017.
  • Implementations: “Ranking‑Driven Enhance” and “Gradient‑Guided Adaptive Weighter”.

License

This project is licensed under the Apache License 2.0. See LICENSE.

Citation

If you use this repository, please cite: “OmniGenRec (Tencent Advertising Competition 2025)”.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors