@@ -45,14 +45,14 @@ Each pooling model in vLLM supports one or more of these tasks according to
45
45
[ Pooler.get_supported_tasks] [ vllm.model_executor.layers.pooler.Pooler.get_supported_tasks ] ,
46
46
enabling the corresponding APIs:
47
47
48
- | Task | APIs |
49
- | ------------| --------------------|
50
- | ` encode ` | ` encode ` |
51
- | ` embed ` | ` embed ` , ` score ` \* |
52
- | ` classify ` | ` classify ` |
53
- | ` score ` | ` score ` |
48
+ | Task | APIs |
49
+ | ------------| -------------------------------------- |
50
+ | ` encode ` | ` LLM.reward(...) ` |
51
+ | ` embed ` | ` LLM. embed(...) ` , ` LLM. score(...) ` \* |
52
+ | ` classify ` | ` LLM. classify(...) ` |
53
+ | ` score ` | ` LLM. score(...) ` |
54
54
55
- \* The ` score ` API falls back to ` embed ` task if the model does not support ` score ` task.
55
+ \* The ` LLM. score(...) ` API falls back to ` embed ` task if the model does not support ` score ` task.
56
56
57
57
### Pooler Configuration
58
58
@@ -66,11 +66,11 @@ you can override some of its attributes via the `--override-pooler-config` optio
66
66
If the model has been converted via ` --convert ` (see above),
67
67
the pooler assigned to each task has the following attributes by default:
68
68
69
- | Task | Pooling Type | Normalization | Softmax |
70
- | ------------| ---------------- | ---------------| ---------|
71
- | ` encode ` | ` ALL ` | ❌ | ❌ |
72
- | ` embed ` | ` LAST ` | ✅︎ | ❌ |
73
- | ` classify ` | ` LAST ` | ❌ | ✅︎ |
69
+ | Task | Pooling Type | Normalization | Softmax |
70
+ | ------------| --------------| ---------------| ---------|
71
+ | ` reward ` | ` ALL ` | ❌ | ❌ |
72
+ | ` embed ` | ` LAST ` | ✅︎ | ❌ |
73
+ | ` classify ` | ` LAST ` | ❌ | ✅︎ |
74
74
75
75
When loading [ Sentence Transformers] ( https://huggingface.co/sentence-transformers ) models,
76
76
its Sentence Transformers configuration file (` modules.json ` ) takes priority over the model's defaults.
@@ -83,21 +83,6 @@ which takes priority over both the model's and Sentence Transformers's defaults.
83
83
The [ LLM] [ vllm.LLM ] class provides various methods for offline inference.
84
84
See [ configuration] [ configuration ] for a list of options when initializing the model.
85
85
86
- ### ` LLM.encode `
87
-
88
- The [ encode] [ vllm.LLM.encode ] method is available to all pooling models in vLLM.
89
- It returns the extracted hidden states directly, which is useful for reward models.
90
-
91
- ``` python
92
- from vllm import LLM
93
-
94
- llm = LLM(model = " Qwen/Qwen2.5-Math-RM-72B" , runner = " pooling" )
95
- (output,) = llm.encode(" Hello, my name is" )
96
-
97
- data = output.outputs.data
98
- print (f " Data: { data!r } " )
99
- ```
100
-
101
86
### ` LLM.embed `
102
87
103
88
The [ embed] [ vllm.LLM.embed ] method outputs an embedding vector for each prompt.
@@ -106,7 +91,7 @@ It is primarily designed for embedding models.
106
91
``` python
107
92
from vllm import LLM
108
93
109
- llm = LLM(model = " intfloat/e5-mistral-7b-instruct " , runner = " pooling" )
94
+ llm = LLM(model = " intfloat/e5-small " , runner = " pooling" )
110
95
(output,) = llm.embed(" Hello, my name is" )
111
96
112
97
embeds = output.outputs.embedding
@@ -154,6 +139,46 @@ print(f"Score: {score}")
154
139
155
140
A code example can be found here: < gh-file:examples/offline_inference/basic/score.py >
156
141
142
+ ### ` LLM.reward `
143
+
144
+ The [ reward] [ vllm.LLM.reward ] method is available to all reward models in vLLM.
145
+ It returns the extracted hidden states directly.
146
+
147
+ ``` python
148
+ from vllm import LLM
149
+
150
+ llm = LLM(model = " internlm/internlm2-1_8b-reward" , runner = " pooling" , trust_remote_code = True )
151
+ (output,) = llm.reward(" Hello, my name is" )
152
+
153
+ data = output.outputs.data
154
+ print (f " Data: { data!r } " )
155
+ ```
156
+
157
+ A code example can be found here: < gh-file:examples/offline_inference/basic/reward.py >
158
+
159
+ ### ` LLM.encode `
160
+
161
+ The [ encode] [ vllm.LLM.encode ] method is available to all pooling models in vLLM.
162
+ It returns the extracted hidden states directly.
163
+
164
+ !!! note
165
+ Please use one of the more specific methods or set the task directly when using ` LLM.encode ` :
166
+
167
+ - For embeddings, use `LLM.embed(...)` or `pooling_task="embed"`.
168
+ - For classification logits, use `LLM.classify(...)` or `pooling_task="classify"`.
169
+ - For rewards, use `LLM.reward(...)` or `pooling_task="reward"`.
170
+ - For similarity scores, use `LLM.score(...)`.
171
+
172
+ ``` python
173
+ from vllm import LLM
174
+
175
+ llm = LLM(model = " intfloat/e5-small" , runner = " pooling" )
176
+ (output,) = llm.encode(" Hello, my name is" , pooling_task = " embed" )
177
+
178
+ data = output.outputs.data
179
+ print (f " Data: { data!r } " )
180
+ ```
181
+
157
182
## Online Serving
158
183
159
184
Our [ OpenAI-Compatible Server] ( ../serving/openai_compatible_server.md ) provides endpoints that correspond to the offline APIs:
0 commit comments