🎯
#pragma unroll
@xlite-dev, @vipshop, LeetCUDA.
- Guangzhou, China
-
20:23
(UTC +08:00) - https://github.com/xlite-dev
Pinned Loading
-
xlite-dev/LeetCUDA
xlite-dev/LeetCUDA Public📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
-
xlite-dev/lite.ai.toolkit
xlite-dev/lite.ai.toolkit Public🛠 A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
-
xlite-dev/Awesome-LLM-Inference
xlite-dev/Awesome-LLM-Inference Public📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
xlite-dev/ffpa-attn
xlite-dev/ffpa-attn Public🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
-
vipshop/cache-dit
vipshop/cache-dit Public🤗An Unified and Training-free Cache Acceleration Toolbox for DiTs: Cache Acceleration with One-line Code ~
♥️
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.