@@ -34,23 +34,25 @@ th:not(:first-child) {
34
34
}
35
35
</style >
36
36
37
- | Feature | [ CP] [ chunked-prefill ] | [ APC] ( automatic_prefix_caching.md ) | [ LoRA] ( lora.md ) | [ SD] ( spec_decode.md ) | CUDA graph | < abbr title = " Pooling Models " > pooling</ abbr > | <abbr title =" Encoder-Decoder Models " >enc-dec</abbr > | <abbr title =" Logprobs " >logP</abbr > | <abbr title =" Prompt Logprobs " >prmpt logP</abbr > | <abbr title =" Async Output Processing " >async output</abbr > | multi-step | <abbr title =" Multimodal Inputs " >mm</abbr > | best-of | beam-search |
37
+ | Feature | [ CP] [ chunked-prefill ] | [ APC] ( automatic_prefix_caching.md ) | [ LoRA] ( lora.md ) | [ SD] ( spec_decode.md ) | CUDA graph | [ pooling] ( ../models/pooling_models.md ) | <abbr title =" Encoder-Decoder Models " >enc-dec</abbr > | <abbr title =" Logprobs " >logP</abbr > | <abbr title =" Prompt Logprobs " >prmpt logP</abbr > | <abbr title =" Async Output Processing " >async output</abbr > | multi-step | <abbr title =" Multimodal Inputs " >mm</abbr > | best-of | beam-search |
38
38
| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---|
39
39
| [ CP] [ chunked-prefill ] | ✅ | | | | | | | | | | | | | | |
40
40
| [ APC] ( automatic_prefix_caching.md ) | ✅ | ✅ | | | | | | | | | | | | | |
41
41
| [ LoRA] ( lora.md ) | ✅ | ✅ | ✅ | | | | | | | | | | | | |
42
42
| [ SD] ( spec_decode.md ) | ✅ | ✅ | ❌ | ✅ | | | | | | | | | | |
43
43
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | | | | | | | | | |
44
- | < abbr title = " Pooling Models " > pooling</ abbr > | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | | | | | | | | |
44
+ | [ pooling] ( ../models/pooling_models.md ) | ✅ \* | ✅ \* | ✅ | ❌ | ✅ | ✅ | | | | | | | | |
45
45
| <abbr title =" Encoder-Decoder Models " >enc-dec</abbr > | ❌ | [ ❌] ( gh-issue:7366 ) | ❌ | [ ❌] ( gh-issue:7366 ) | ✅ | ✅ | ✅ | | | | | | | |
46
46
| <abbr title =" Logprobs " >logP</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | | | | | | |
47
47
| <abbr title =" Prompt Logprobs " >prmpt logP</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | | | | | |
48
48
| <abbr title =" Async Output Processing " >async output</abbr > | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | | | | |
49
49
| multi-step | ❌ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | | | |
50
- | < abbr title = " Multimodal Inputs " >mm</ abbr > | ✅ | [ 🟠 ] ( gh-pr:8348 ) | [ 🟠] ( gh-pr:4194 ) | ❔ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | | |
50
+ | [ mm ] ( multimodal_inputs.md ) | ✅ | ✅ | [ 🟠] ( gh-pr:4194 ) | ❔ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ✅ | | |
51
51
| best-of | ✅ | ✅ | ✅ | [ ❌] ( gh-issue:6137 ) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [ ❌] ( gh-issue:7968 ) | ✅ | ✅ | |
52
52
| beam-search | ✅ | ✅ | ✅ | [ ❌] ( gh-issue:6137 ) | ✅ | ❌ | ✅ | ✅ | ✅ | ❔ | [ ❌] ( gh-issue:7968 ) | ❔ | ✅ | ✅ |
53
53
54
+ \* Chunked prefill and prefix caching are only applicable to last-token pooling.
55
+
54
56
[ ] ( ) { #feature-x-hardware }
55
57
56
58
## Feature x Hardware
@@ -62,9 +64,9 @@ th:not(:first-child) {
62
64
| [ LoRA] ( lora.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
63
65
| [ SD] ( spec_decode.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
64
66
| CUDA graph | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ |
65
- | < abbr title = " Pooling Models " > pooling</ abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❔ | ❌ |
67
+ | [ pooling] ( ../models/pooling_models.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
66
68
| <abbr title =" Encoder-Decoder Models " >enc-dec</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
67
- | < abbr title = " Multimodal Inputs " >mm</ abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
69
+ | [ mm ] ( multimodal_inputs.md ) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
68
70
| <abbr title =" Logprobs " >logP</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
69
71
| <abbr title =" Prompt Logprobs " >prmpt logP</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
70
72
| <abbr title =" Async Output Processing " >async output</abbr > | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
0 commit comments