Skip to content

Commit de24e74

Browse files
authored
Docs: How to use MinerU to parse pdf documents (infiniflow#10763)
### What problem does this PR solve? ### Type of change - [x] Documentation Update
1 parent 83e80e3 commit de24e74

File tree

3 files changed

+49
-2
lines changed

3 files changed

+49
-2
lines changed

docs/faq.mdx

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -510,3 +510,27 @@ See [here](./guides/agent/best_practices/accelerate_agent_question_answering.md)
510510

511511
---
512512

513+
### How to use MinerU to parse PDF documents?
514+
515+
MinerU PDF document parsing is available starting from v0.21.1. To use this feature, follow these steps:
516+
517+
1. Before deploying ragflow-server, update your **docker/.env** file:
518+
- Enable `HF_ENDPOINT=https://hf-mirror.com`
519+
- Add a MinerU entry: `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru`
520+
521+
2. Start the ragflow-server and run the following commands inside the container:
522+
523+
```bash
524+
mkdir uv_tools
525+
cd uv_tools
526+
uv venv .venv
527+
source .venv/bin/activate
528+
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
529+
```
530+
531+
3. Restart the ragflow-server.
532+
4. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown, which supports PDF parsing, and slect **MinerU** in **PDF parser**.
533+
5. If you use a custom ingestion pipeline instead, you must also complete the first three steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.
534+
535+
536+

docs/guides/dataset/select_pdf_parser.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,31 @@ RAGFlow isn't one-size-fits-all. It is built for flexibility and supports deeper
3535

3636
- DeepDoc: (Default) The default visual model performing OCR, TSR, and DLR tasks on PDFs, which can be time-consuming.
3737
- Naive: Skip OCR, TSR, and DLR tasks if *all* your PDFs are plain text.
38+
- MinerU: An experimental feature.
3839
- A third-party visual model provided by a specific model provider.
3940

41+
:::danger IMPORTANG
42+
MinerU PDF document parsing is available starting from v0.21.1. To use this feature, follow these steps:
43+
44+
1. Before deploying ragflow-server, update your **docker/.env** file:
45+
- Enable `HF_ENDPOINT=https://hf-mirror.com`
46+
- Add a MinerU entry: `MINERU_EXECUTABLE=/ragflow/uv_tools/.venv/bin/mineru`
47+
48+
2. Start the ragflow-server and run the following commands inside the container:
49+
50+
```bash
51+
mkdir uv_tools
52+
cd uv_tools
53+
uv venv .venv
54+
source .venv/bin/activate
55+
uv pip install -U "mineru[core]" -i https://mirrors.aliyun.com/pypi/simple
56+
```
57+
58+
3. Restart the ragflow-server.
59+
4. In the web UI, navigate to the **Configuration** page of your dataset. Click **Built-in** in the **Ingestion pipeline** section, select a chunking method from the **Built-in** dropdown, which supports PDF parsing, and slect **MinerU** in **PDF parser**.
60+
5. If you use a custom ingestion pipeline instead, you must also complete the first three steps before selecting **MinerU** in the **Parsing method** section of the **Parser** component.
61+
:::
62+
4063
:::caution WARNING
4164
Third-party visual models are marked **Experimental**, because we have not fully tested these models for the aforementioned data extraction tasks.
4265
:::

docs/release_notes.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,12 @@ Released on October 23, 2025.
2828

2929
### New features
3030

31-
- Experimental: Adds support for PDF document parsing using MinerU.
31+
- Experimental: Adds support for PDF document parsing using MinerU. See [here](./faq.mdx#how-to-use-mineru-to-parse-pdf-documents).
3232

3333
### Improvements
3434

3535
- Enhances UI/UX for the dataset and personal center pages.
36-
- Upgrades RAGFlow's document engine, Infinity, to v0.6.1.
36+
- Upgrades RAGFlow's document engine, [Infinity](https://github.com/infiniflow/infinity), to v0.6.1.
3737

3838
### Fixed issues
3939

0 commit comments

Comments
 (0)