-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
- [Decode] acc/perf improvement: Tune attention perf to align with IPEX attention functions #162
- [Decode] causal acc fix. fix is_causal impact on decode kernel #181
- [Decode] sliding window Add sliding window support for paged decode kernel #168
- [Decode] fp8 kv cache Support FP8 KV cache in paged_decode kernel #166
- [Decode] MLA support Support arbitrary KV cache strides in paged_decode for MLA #165
- [Decode][ChunkPrefill] block size support 16/32 Add block_size 16/32 support for chunk prefill and fix paged decode #171
- [Decode][ChunkPrefill] block size support multiple of 32/64
- [Decode][ChunkPrefill] support query use FP8 [fmha] support fp8 query #153
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels