Skip to content

Commit 9d26cee

Browse files
authored
Add online serving usage with custom logits processor for DeepSeek-OCR (#101)
Signed-off-by: Isotr0py <[email protected]>
1 parent 9843216 commit 9d26cee

File tree

1 file changed

+57
-0
lines changed

1 file changed

+57
-0
lines changed

DeepSeek/DeepSeek-OCR.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
1313
```
1414

1515
## Running DeepSeek-OCR
16+
### Offline OCR tasks
1617
In this guide, we demonstrate how to set up DeepSeek-OCR for offline OCR batch processing tasks.
1718

1819

@@ -64,6 +65,62 @@ for output in model_outputs:
6465
print(output.outputs[0].text)
6566
```
6667

68+
### Online OCR serving
69+
In this guide, we demonstrate how to set up DeepSeek-OCR for online OCR serving with OpenAI compatible API server.
70+
71+
```bash
72+
vllm serve deepseek-ai/DeepSeek-OCR --logits_processors vllm.model_executor.models.deepseek_ocr.NGramPerReqLogitsProcessor --no-enable-prefix-caching --mm-processor-cache-gb 0
73+
```
74+
75+
```python3
76+
import time
77+
from openai import OpenAI
78+
79+
client = OpenAI(
80+
api_key="EMPTY",
81+
base_url="http://localhost:8000/v1",
82+
timeout=3600
83+
)
84+
85+
messages = [
86+
{
87+
"role": "user",
88+
"content": [
89+
{
90+
"type": "image_url",
91+
"image_url": {
92+
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
93+
}
94+
},
95+
{
96+
"type": "text",
97+
"text": "Free OCR."
98+
}
99+
]
100+
}
101+
]
102+
103+
start = time.time()
104+
response = client.chat.completions.create(
105+
model="deepseek-ai/DeepSeek-OCR",
106+
messages=messages,
107+
max_tokens=2048,
108+
temperature=0.0,
109+
extra_body={
110+
"skip_special_tokens": False,
111+
# args used to control custom logits processor
112+
"vllm_xargs": {
113+
"ngram_size": 30,
114+
"window_size": 90,
115+
# whitelist: <td>, </td>
116+
"whitelist_token_ids": [128821, 128822],
117+
},
118+
},
119+
)
120+
print(f"Response costs: {time.time() - start:.2f}s")
121+
print(f"Generated text: {response.choices[0].message.content}")
122+
```
123+
67124
## Configuration Tips
68125
- **It's important to use the custom logits processor** along with the model for the optimal OCR and markdown generation performance.
69126
- Unlike multi-turn chat use cases, we do not expect OCR tasks to benefit significantly from prefix caching or image reuse, therefore it's recommended to turn off these features to avoid unnecessary hashing and caching.

0 commit comments

Comments
 (0)