I am Zheng Cai, nickname zigzagcai, an AI Infra Engineer and Lifelong Learner.
I have general interest in (M)LLM pre/post-train and love to share my thoughts via blogs on zhihu: 由A800平台训练InternLM-7B无法收敛引发的思考, 支持变长序列的Mamba-1训练.
🥑 For now, I have personal interest in Agentic RL and Inference-Time Scaling, and believe it will bring new paradiam shift.
🍓 For AI, I believe that more is different and intelligence emerges from complexity, and like the ideas behind The Bitter Lesson.
🍒 For Infra, I love to build practical distributed systems that orchestrate computation/communication/caching to scale up and scale out better, and believe in the ideas behind The Hardware Lottery.
So, what I try to do is to build a bridge between various accelerators and large models, with the hope of achieving efficient system-model co-design in the new AI paradiam (Self-Evolving Agentic AI Systems).
I love the general idea of open source (code/knowledge/and others) and love to learn from open source community and try my best to contribute back.
Selected thoughts I have ever shared or developed:
- CPU memory optimization when using PyTorch Dataloader over very large-scale datasets: pytorch/pytorch#13246 (comment)
- Analysis of numerical stability between Ring and Tree All-Reduce: NVIDIA/nccl#1055
- The first to implement variable-length training with Mamba State Space Models: state-spaces/mamba#244
- Avoid deadlock when training with ColossalAI over very large-scale GPU clusters: hpcaitech/ColossalAI#5625
- DeepSeek V3 671B trainable with FSDP+EP by hacking two lines of PyTorch FSDP codes: https://github.com/zigzagcai/DeepSeekV3
- Support nogil feature in NumPy-1.18.5 in the experimental CPython ecosystem: https://github.com/colesbury/numpy/commit/0d6ef2770268711ee6417792ba0da35fcb264bf5



