Skip to content

Commit be32727

Browse files
authored
Merge pull request #43 from AmberLJC/claude/issue-42-20251212-2352
Add DS SERVE paper and NeurIPS 2025 ML Systems reference
2 parents 1865643 + 4fe69bc commit be32727

File tree

1 file changed

+16
-0
lines changed

1 file changed

+16
-0
lines changed

README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ A curated list of Large Language Model systems related academic papers, articles
1717
- [Multi-Modal Serving Systems](#multi-modal-serving-systems)
1818
- [LLM for Systems](#llm-for-systems)
1919
- [Industrial LLM Technical Report](#industrial-llm-technical-report)
20+
- [ML Conferences](#ml-conferences)
21+
- [NeurIPS 2025](#neurips-2025)
2022
- [LLM Frameworks](#llm-frameworks)
2123
- [Training](#training-1)
2224
- [Post-Training](#post-training)
@@ -246,6 +248,7 @@ A curated list of Large Language Model systems related academic papers, articles
246248
- [RAGO](https://arxiv.org/abs/2503.14649v2): Systematic Performance Optimization for Retrieval-Augmented Generation Serving | ISCA'25
247249
- [Circinus](https://arxiv.org/abs/2504.16397): Efficient Query Planner for Compound ML Serving | UIUC
248250
- [Patchwork: A Unified Framework for RAG Serving](https://arxiv.org/abs/2505.07833)
251+
- [DS SERVE](https://berkeley-large-rag.github.io/RAG-DS-Serve/): A Framework for Efficient and Scalable Neural Retrieval | UCB
249252
- [KVFlow](https://arxiv.org/abs/2507.07400): Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent Workflows
250253
- [DroidSpeak](https://arxiv.org/abs/2411.02820): KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
251254
- [Murakkab](https://arxiv.org/abs/2508.18298): Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms
@@ -349,6 +352,19 @@ A curated list of Large Language Model systems related academic papers, articles
349352
- [Kimi-K2: Open Agentic Intelligence](https://arxiv.org/abs/2507.20534) – (Jul 2025)
350353
- [GPT-oss-120b & GPT-oss-20b](https://cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf) – (Aug 2025)
351354

355+
## ML Conferences
356+
### NeurIPS 2025
357+
358+
A curated collection of **[NeurIPS 2025 papers](neurips25-mlsys/)** focused on efficient systems for generative AI models. The collection includes papers on:
359+
- [Architecture & Efficient Mechanisms](neurips25-mlsys/architecture.md) - Efficient attention, KV-cache systems, speculative decoding
360+
- [Model Compression & Quantization](neurips25-mlsys/compression.md) - Quantization, pruning, KV cache compression
361+
- [Inference & Serving](neurips25-mlsys/inference.md) - LLM serving, scheduling, distributed inference
362+
- [Multi-Modal & Diffusion](neurips25-mlsys/multi-modality.md) - VLM efficiency, diffusion optimization
363+
- [Reinforcement Learning](neurips25-mlsys/rl.md) - RL training infrastructure, policy optimization
364+
- [Training Systems](neurips25-mlsys/training.md) - Distributed training, memory efficiency
365+
366+
See the **[full NeurIPS 2025 collection](neurips25-mlsys/)** for detailed categorization and paper summaries.
367+
352368
## LLM Frameworks
353369
### Training
354370
- [DeepSpeed](https://github.com/microsoft/DeepSpeed): a deep learning optimization library that makes distributed training and inference easy, efficient, and effective | Microsoft

0 commit comments

Comments
 (0)