Commit eff3e5f
authored
[FEAT] Refactor spec decode to support efficient padded speculation (#3528)
### What this PR does / why we need it?
1. Refactor the file `mtp_proposer.py`, splits torchair related codes
into `mtp_torchair_proposer.py`
2. According to vllm-project/vllm#24539,
implements padded speculative decoding as described in
vllm-project/vllm#21984.
### Does this PR introduce _any_ user-facing change?
User can use `disable_padded_drafter_batch` to disable/enable padded
speculation, default is `False`.
offline example:
```
speculative_config={"method": "deepseek_mtp", "num_speculative_tokens": 1, "disable_padded_drafter_batch": False}
```
### How was this patch tested?
- [x] egaer with pad/unpad:
- [x] aclgraph with pad/unpad
- [x] torchair with pad/unpad
performance test of deepseek-r1 with tp16、dp1
aclgraph with pad ITL: 168ms
aclgraph with unpad ITL: 169ms
original: 178ms
- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@83f478b
---------
Signed-off-by: xuyexiong <[email protected]>1 parent 10772d9 commit eff3e5f
File tree
7 files changed
+1207
-444
lines changed- tests/e2e/singlecard/spec_decode_v1
- vllm_ascend
- spec_decode
- torchair
- worker
7 files changed
+1207
-444
lines changedLines changed: 71 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
3 | 5 | | |
4 | 6 | | |
5 | 7 | | |
6 | 8 | | |
7 | 9 | | |
8 | 10 | | |
| 11 | + | |
| 12 | + | |
9 | 13 | | |
10 | 14 | | |
11 | 15 | | |
| |||
17 | 21 | | |
18 | 22 | | |
19 | 23 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
26 | 30 | | |
27 | 31 | | |
28 | 32 | | |
| |||
37 | 41 | | |
38 | 42 | | |
39 | 43 | | |
40 | | - | |
| 44 | + | |
41 | 45 | | |
42 | 46 | | |
43 | 47 | | |
| |||
54 | 58 | | |
55 | 59 | | |
56 | 60 | | |
| 61 | + | |
57 | 62 | | |
58 | | - | |
| 63 | + | |
59 | 64 | | |
60 | 65 | | |
61 | 66 | | |
| |||
82 | 87 | | |
83 | 88 | | |
84 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
85 | 104 | | |
86 | 105 | | |
87 | 106 | | |
| |||
110 | 129 | | |
111 | 130 | | |
112 | 131 | | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
25 | 30 | | |
26 | 31 | | |
27 | 32 | | |
28 | 33 | | |
29 | 34 | | |
| 35 | + | |
| 36 | + | |
30 | 37 | | |
31 | 38 | | |
32 | 39 | | |
| |||
0 commit comments