|
| 1 | +--- |
| 2 | +pipeline_tag: image-text-to-text |
| 3 | +language: |
| 4 | +- multilingual |
| 5 | +tags: |
| 6 | +- mindspore |
| 7 | +- mindnlp |
| 8 | +- deepseek |
| 9 | +- vision-language |
| 10 | +- ocr |
| 11 | +- custom_code |
| 12 | +license: mit |
| 13 | +--- |
| 14 | +<div align="center"> |
| 15 | + <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek AI" /> |
| 16 | +</div> |
| 17 | +<hr> |
| 18 | +<div align="center"> |
| 19 | + <a href="https://www.deepseek.com/" target="_blank"> |
| 20 | + <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" /> |
| 21 | + </a> |
| 22 | + <a href="https://huggingface.co/lvyufeng/DeepSeek-OCR" target="_blank"> |
| 23 | + <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" /> |
| 24 | + </a> |
| 25 | + |
| 26 | +</div> |
| 27 | + |
| 28 | + |
| 29 | + |
| 30 | + |
| 31 | +<p align="center"> |
| 32 | + <a href="https://github.com/mindspore-lab/mindnlp/tree/master/examples/transformers/inference/deepseek-ocr"><b>🌟 Github</b></a> | |
| 33 | + <a href="https://huggingface.co/lvyufeng/DeepSeek-OCR"><b>📥 Model Download</b></a> | |
| 34 | + <a href="https://github.com/deepseek-ai/DeepSeek-OCR/blob/main/DeepSeek_OCR_paper.pdf"><b>📄 Paper Link</b></a> | |
| 35 | + <a href=""><b>📄 Arxiv Paper Link</b></a> | |
| 36 | +</p> |
| 37 | +<h2> |
| 38 | +<p align="center"> |
| 39 | + <a href="">DeepSeek-OCR: Contexts Optical Compression</a> |
| 40 | +</p> |
| 41 | +</h2> |
| 42 | +<p align="center"> |
| 43 | +<a href="">Explore the boundaries of visual-text compression.</a> |
| 44 | +</p> |
| 45 | + |
| 46 | +## Usage |
| 47 | +Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.12.9 + CUDA11.8: |
| 48 | + |
| 49 | +``` |
| 50 | +mindspore==2.7.0 |
| 51 | +mindnlp==0.5.0rc4 |
| 52 | +transformers==4.57.1 |
| 53 | +tokenizers |
| 54 | +einops |
| 55 | +addict |
| 56 | +easydict |
| 57 | +``` |
| 58 | + |
| 59 | +```python |
| 60 | +import os |
| 61 | +import mindnlp |
| 62 | +import torch |
| 63 | +from transformers import AutoModel, AutoTokenizer |
| 64 | + |
| 65 | +model_name = 'lvyufeng/DeepSeek-OCR-Community-Latest' |
| 66 | + |
| 67 | +tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
| 68 | +model = AutoModel.from_pretrained(model_name, _attn_implementation='sdpa', trust_remote_code=True, use_safetensors=True, device_map='auto') |
| 69 | +model = model.eval() |
| 70 | + |
| 71 | +# prompt = "<image>\nFree OCR. " |
| 72 | +prompt = "<image>\n<|grounding|>Convert the document to markdown. " |
| 73 | +image_file = 'your_image.jpg' |
| 74 | +output_path = 'your/output/dir' |
| 75 | + |
| 76 | +# infer(self, tokenizer, prompt='', image_file='', output_path = ' ', base_size = 1024, image_size = 640, crop_mode = True, test_compress = False, save_results = False): |
| 77 | + |
| 78 | +# Tiny: base_size = 512, image_size = 512, crop_mode = False |
| 79 | +# Small: base_size = 640, image_size = 640, crop_mode = False |
| 80 | +# Base: base_size = 1024, image_size = 1024, crop_mode = False |
| 81 | +# Large: base_size = 1280, image_size = 1280, crop_mode = False |
| 82 | + |
| 83 | +# Gundam: base_size = 1024, image_size = 640, crop_mode = True |
| 84 | + |
| 85 | +res = model.infer(tokenizer, prompt=prompt, image_file=image_file, output_path = output_path, base_size = 1024, image_size = 640, crop_mode=True, save_results = True, test_compress = True) |
| 86 | +``` |
| 87 | + |
| 88 | +## Acknowledgement |
| 89 | + |
| 90 | +We would like to thank [Vary](https://github.com/Ucas-HaoranWei/Vary/), [GOT-OCR2.0](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/), [MinerU](https://github.com/opendatalab/MinerU), [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR), [OneChart](https://github.com/LingyvKong/OneChart), [Slow Perception](https://github.com/Ucas-HaoranWei/Slow-Perception) for their valuable models and ideas. |
| 91 | + |
| 92 | +We also appreciate the benchmarks: [Fox](https://github.com/ucaslcl/Fox), [OminiDocBench](https://github.com/opendatalab/OmniDocBench). |
0 commit comments