Skip to content

Commit 42f4d00

Browse files
tdoublepamd-xiaoyu12
authored andcommitted
[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (vllm-project#23665)
Signed-off-by: Thomas Parnell <[email protected]> Signed-off-by: Xiao Yu <[email protected]>
1 parent 091d2fb commit 42f4d00

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

docs/usage/v1_guide.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -111,11 +111,10 @@ Models that use Mamba-2 and Mamba-1 layers (e.g., `Mamba2ForCausalLM`, `MambaFor
111111

112112
Models that combine Mamba-2 and Mamba-1 layers with standard attention layers are also supported (e.g., `BambaForCausalLM`,
113113
`Zamba2ForCausalLM`, `NemotronHForCausalLM`, `FalconH1ForCausalLM` and `GraniteMoeHybridForCausalLM`, `JambaForCausalLM`). Please note that
114-
these models currently require disabling prefix caching and using the FlashInfer attention backend in V1.
114+
these models currently require disabling prefix caching in V1.
115115

116116
Hybrid models with mechanisms different to Mamba are also supported (e.g, `MiniMaxText01ForCausalLM`, `MiniMaxM1ForCausalLM`).
117-
Please note that these models currently require disabling prefix caching, enforcing eager mode, and using the FlashInfer
118-
attention backend in V1.
117+
Please note that these models currently require disabling prefix caching and enforcing eager mode in V1.
119118

120119
#### Encoder-Decoder Models
121120

0 commit comments

Comments
 (0)