Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions DeepSeek/DeepSeek-OCR.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
```

## Running DeepSeek-OCR
### Offline OCR tasks
In this guide, we demonstrate how to set up DeepSeek-OCR for offline OCR batch processing tasks.


Expand Down Expand Up @@ -64,6 +65,62 @@ for output in model_outputs:
print(output.outputs[0].text)
```

### Online OCR serving
In this guide, we demonstrate how to set up DeepSeek-OCR for online OCR serving with OpenAI compatible API server.

```bash
vllm serve deepseek-ai/DeepSeek-OCR --logits_processors vllm.model_executor.models.deepseek_ocr.NGramPerReqLogitsProcessor --no-enable-prefix-caching --mm-processor-cache-gb 0
```

```python3
import time
from openai import OpenAI

client = OpenAI(
api_key="EMPTY",
base_url="http://localhost:8000/v1",
timeout=3600
)

messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
}
},
{
"type": "text",
"text": "Free OCR."
}
]
}
]

start = time.time()
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-OCR",
messages=messages,
max_tokens=2048,
temperature=0.0,
extra_body={
"skip_special_tokens": False,
# args used to control custom logits processor
"vllm_xargs": {
"ngram_size": 30,
"window_size": 90,
# whitelist: <td>, </td>
"whitelist_token_ids": [128821, 128822],
},
},
)
print(f"Response costs: {time.time() - start:.2f}s")
print(f"Generated text: {response.choices[0].message.content}")
```

## Configuration Tips
- **It's important to use the custom logits processor** along with the model for the optimal OCR and markdown generation performance.
- Unlike multi-turn chat use cases, we do not expect OCR tasks to benefit significantly from prefix caching or image reuse, therefore it's recommended to turn off these features to avoid unnecessary hashing and caching.
Expand Down