Skip to content

Commit 9214891

Browse files
authored
Merge branch 'main' into docs/add-claude-md-and-agent-docs
2 parents 7b71ace + 949efb3 commit 9214891

File tree

14 files changed

+241
-66
lines changed

14 files changed

+241
-66
lines changed

deploy/paddleocr_vl_docker/hps/README.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
| 组件 | 说明 |
1616
|----------------|----------------------------------------|
1717
| FastAPI 网关 | 统一访问入口、简化客户端调用、并发控制 |
18-
| Triton 服务器 | 模型管理、动态批处理、推理调度 |
19-
| vLLM 服务器 | 连续批处理、VLM 推理 |
18+
| Triton 服务器 | 版面检测模型(PP-DocLayoutV3)及产线串联逻辑,负责模型管理、动态批处理、推理调度 |
19+
| vLLM 服务器 | VLM(PaddleOCR-VL-1.5),连续批处理推理 |
2020

2121
**Triton 模型:**
2222

@@ -158,6 +158,28 @@ UVICORN_WORKERS=2
158158

159159
Triton 自动将请求批处理以提高推理设备利用率。最大批处理大小通过模型配置文件中的 `max_batch_size` 参数控制(默认:8),配置文件位于模型仓库目录下的 `config.pbtxt`(如 `model_repo/layout-parsing/config.pbtxt`)。
160160

161+
### Triton 实例数
162+
163+
每个 Triton 模型的并行推理实例数通过 `config.pbtxt` 中的 `instance_group` 配置(默认:1)。增加实例数可以提高并行处理能力,但会占用更多设备资源。
164+
165+
```
166+
# model_repo/layout-parsing/config.pbtxt
167+
instance_group [
168+
{
169+
count: 1 # 实例数,增大可提高并行度
170+
kind: KIND_GPU
171+
gpus: [ 0 ]
172+
}
173+
]
174+
```
175+
176+
实例数与动态批处理之间存在权衡:
177+
178+
- **单实例(`count: 1`**:动态批处理会将多个请求合并为一个批次并行执行,但同批次的请求需等待最慢的那个完成后才能一起返回,可能导致部分请求的时延升高。同时,单实例同一时刻只能处理一个批次,当前批次未完成时后续请求只能排队等待。适合显存有限或请求耗时较均匀的场景
179+
- **多实例(`count: 2+`**:多个实例可以同时各自处理不同的批次,能够同时处理更多请求,减少排队等待时间,单个请求的时延也会有所改善。但需注意,同一实例内的批次仍然遵循动态批处理的行为(批内请求一起开始、一起结束)。每增加一个实例会额外占用一份版面检测模型的显存,同时也会增加对 VLM 推理服务的负载以及内存和 CPU 的使用,需根据推理设备的资源情况酌情设置
180+
181+
非推理模型(如 `restructure-pages`)运行在 CPU 上,可根据 CPU 核数适当增加实例数。
182+
161183
## 故障排查与解决
162184

163185
### 服务无法启动

deploy/paddleocr_vl_docker/hps/README_en.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ Client → FastAPI Gateway → Triton Server → vLLM Server
1515
| Component | Description |
1616
|-----------------|-----------------------------------------------------------------------|
1717
| FastAPI Gateway | Unified access point, simplified client calls, concurrency control |
18-
| Triton Server | Model management, dynamic batching, inference scheduling |
19-
| vLLM Server | Continuous batching, VLM inference |
18+
| Triton Server | Layout detection model (PP-DocLayoutV3) and pipeline orchestration; model management, dynamic batching, inference scheduling |
19+
| vLLM Server | VLM (PaddleOCR-VL-1.5), continuous batching inference |
2020

2121
**Triton Models:**
2222

@@ -158,6 +158,28 @@ Each Uvicorn worker is an independent process with its own event loop:
158158

159159
Triton automatically batches requests to improve inference device utilization. The maximum batch size is controlled by the `max_batch_size` parameter in the model configuration file (default: 8), located at `config.pbtxt` under each model directory in the model repository (e.g., `model_repo/layout-parsing/config.pbtxt`).
160160

161+
### Triton Instance Count
162+
163+
The number of parallel inference instances for each Triton model is configured via the `instance_group` section in `config.pbtxt` (default: 1). Increasing the instance count improves parallelism but consumes more device resources.
164+
165+
```
166+
# model_repo/layout-parsing/config.pbtxt
167+
instance_group [
168+
{
169+
count: 1 # Number of instances; increase for higher parallelism
170+
kind: KIND_GPU
171+
gpus: [ 0 ]
172+
}
173+
]
174+
```
175+
176+
There is a trade-off between instance count and dynamic batching:
177+
178+
- **Single instance (`count: 1`)**: Dynamic batching combines multiple requests into one batch for parallel execution, but all requests in the same batch must wait for the slowest one to finish before results are returned, which may increase latency for faster requests. Additionally, a single instance can only process one batch at a time — subsequent requests must queue until the current batch completes. Best suited for scenarios with limited GPU memory or uniform request processing times
179+
- **Multiple instances (`count: 2+`)**: Multiple instances can process different batches simultaneously, allowing more requests to be handled concurrently. This reduces queuing time and improves latency for individual requests. Note that within each instance, dynamic batching behavior still applies (requests in the same batch start and finish together). Each additional instance consumes an extra copy of the layout detection model's GPU memory, increases the load on the VLM inference service, and uses more CPU and system memory. Adjust based on the available resources of your inference device
180+
181+
Non-inference models (e.g., `restructure-pages`) run on CPU and can have their instance count increased based on available CPU cores.
182+
161183
## Troubleshooting and Resolution
162184

163185
### Service Fails to Start

docs/version3.x/installation.en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ python -c "import paddle; print(paddle.__version__)"
7171
If the installation is successful, the following content will be output:
7272

7373
```bash
74-
3.0.0
74+
3.2.0
7575
```
7676

7777
## 1.3 Installation of PaddlePaddle Wheel Package for Windows with NVIDIA 50 Series GPUs

docs/version3.x/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ python -c "import paddle; print(paddle.__version__)"
7171
如果已安装成功,将输出以下内容:
7272

7373
```bash
74-
3.0.0
74+
3.2.0
7575
```
7676

7777
## 1.3 Windows 系统适配 NVIDIA 50 系显卡的 PaddlePaddle wheel 包安装

docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1625,7 +1625,7 @@ The parameters supported by this command are as follows:
16251625
- MLX-VLM: [Refer to this document](./PaddleOCR-VL-Apple-Silicon.en.md)
16261626
- llama.cpp:
16271627
1. Install llama.cpp by referring to the `Quick start` section in the [llama.cpp github](https://github.com/ggml-org/llama.cpp).
1628-
2. Download the model files in gguf format: [megemini/PaddleOCR-VL-1.5-GGUF](https://modelscope.cn/models/megemini/PaddleOCR-VL-1.5-GGUF/files) or [megemini/PaddleOCR-VL-GGUF](https://modelscope.cn/models/megemini/PaddleOCR-VL-GGUF/files).
1628+
2. Download the model files in gguf format: [PaddlePaddle/PaddleOCR-VL-1.5-GGUF](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5-GGUF).
16291629
3. Execute the following command to start the inference service. For an introduction to the parameters, please refer to [LLaMA.cpp HTTP Server](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md):
16301630

16311631
```shell

docs/version3.x/pipeline_usage/PaddleOCR-VL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1604,7 +1604,7 @@ paddleocr genai_server --model_name PaddleOCR-VL-1.5-0.9B --backend vllm --port
16041604
- MLX-VLM:[参考此文档](./PaddleOCR-VL-Apple-Silicon.md)
16051605
- llama.cpp:
16061606
1. 参考 [llama.cpp github](https://github.com/ggml-org/llama.cpp) 中的 `Quick start` 安装 llama.cpp。
1607-
2. 下载 gguf 格式的模型文件:[megemini/PaddleOCR-VL-1.5-GGUF](https://modelscope.cn/models/megemini/PaddleOCR-VL-1.5-GGUF/files)[megemini/PaddleOCR-VL-GGUF](https://modelscope.cn/models/megemini/PaddleOCR-VL-GGUF/files)
1607+
2. 下载 gguf 格式的模型文件:[PaddlePaddle/PaddleOCR-VL-1.5-GGUF](https://huggingface.co/PaddlePaddle/PaddleOCR-VL-1.5-GGUF)
16081608
3. 执行以下命令启动推理服务,参数介绍可参考 [LLaMA.cpp HTTP Server](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md)
16091609

16101610
```shell

skills/paddleocr-doc-parsing/SKILL.md

Lines changed: 41 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,18 @@ description: >
44
Advanced document parsing with PaddleOCR. Returns complete document
55
structure including text, tables, formulas, charts, and layout information. The AI agent extracts
66
relevant content based on user needs.
7+
metadata:
8+
openclaw:
9+
requires:
10+
env:
11+
- PADDLEOCR_DOC_PARSING_API_URL
12+
- PADDLEOCR_ACCESS_TOKEN
13+
- PADDLEOCR_DOC_PARSING_TIMEOUT
14+
bins:
15+
- python
16+
primaryEnv: PADDLEOCR_ACCESS_TOKEN
17+
emoji: "📄"
18+
homepage: https://github.com/PaddlePaddle/PaddleOCR/tree/main/skills/paddleocr-doc-parsing
719
---
820

921
# PaddleOCR Document Parsing Skill
@@ -43,30 +55,32 @@ If the script execution fails (API not configured, network error, etc.):
4355

4456
1. **Execute document parsing**:
4557
```bash
46-
python scripts/vl_caller.py --file-url "URL provided by user"
58+
python scripts/vl_caller.py --file-url "URL provided by user" --pretty
4759
```
4860
Or for local files:
4961
```bash
50-
python scripts/vl_caller.py --file-path "file path"
62+
python scripts/vl_caller.py --file-path "file path" --pretty
5163
```
5264

5365
**Optional: explicitly set file type**:
5466
```bash
55-
python scripts/vl_caller.py --file-url "URL provided by user" --file-type 0
67+
python scripts/vl_caller.py --file-url "URL provided by user" --file-type 0 --pretty
5668
```
5769
- `--file-type 0`: PDF
5870
- `--file-type 1`: image
5971
- If omitted, the service can infer file type from input.
6072

61-
**Save result to file** (recommended):
62-
```bash
63-
python scripts/vl_caller.py --file-url "URL" --output result.json --pretty
64-
```
65-
- The script will display: `Result saved to: /absolute/path/to/result.json`
66-
- This message appears on stderr, the JSON is saved to the file
67-
- **Tell the user the file path** shown in the message
68-
69-
2. **The script returns COMPLETE JSON** with all document content:
73+
**Default behavior: save raw JSON to a temp file**:
74+
- If `--output` is omitted, the script saves automatically under the system temp directory
75+
- Default path pattern: `<system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json`
76+
- If `--output` is provided, it overrides the default temp-file destination
77+
- If `--stdout` is provided, JSON is printed to stdout and no file is saved
78+
- In save mode, the script prints the absolute saved path on stderr: `Result saved to: /absolute/path/...`
79+
- In default/custom save mode, read and parse the saved JSON file before responding
80+
- In save mode, always tell the user the saved file path and that full raw JSON is available there
81+
- Use `--stdout` only when you explicitly want to skip file persistence
82+
83+
2. **The output JSON contains COMPLETE content** with all document data:
7084
- Headers, footers, page numbers
7185
- Main text content
7286
- Tables with structure
@@ -80,7 +94,7 @@ If the script execution fails (API not configured, network error, etc.):
8094
- Supported file types depend on the model and endpoint configuration.
8195
- Always follow the file type constraints documented by your endpoint API.
8296

83-
3. **Extract what the user needs** from stable contract fields based on their request:
97+
3. **Extract what the user needs** from the output JSON using these fields:
8498
- Top-level `text`
8599
- `result[n].markdown`
86100
- `result[n].prunedResult`
@@ -89,15 +103,16 @@ If the script execution fails (API not configured, network error, etc.):
89103

90104
**CRITICAL**: You must display the COMPLETE extracted content to the user based on their needs.
91105

92-
- The script returns ALL document content in a structured format
106+
- The output JSON contains ALL document content in a structured format
107+
- In save mode, the raw provider result can be inspected in the saved JSON file
93108
- **Display the full content requested by the user**, do NOT truncate or summarize
94109
- If user asks for "all text", show the entire `text` field
95110
- If user asks for "tables", show ALL tables in the document
96111
- If user asks for "main content", filter out headers/footers but show ALL body text
97112

98113
**What this means**:
99114
-**DO**: Display complete text, all tables, all formulas as requested
100-
-**DO**: Present content using stable contract fields: top-level `text`, `result[n].markdown`, and `result[n].prunedResult`
115+
-**DO**: Present content using these fields: top-level `text`, `result[n].markdown`, and `result[n].prunedResult`
101116
-**DON'T**: Truncate with "..." unless content is excessively long (>10,000 chars)
102117
-**DON'T**: Summarize or provide excerpts when user asks for full content
103118
-**DON'T**: Say "Here's a preview" when user expects complete output
@@ -126,7 +141,7 @@ Agent: "I found a document with multiple sections. Here's the beginning:
126141

127142
### Understanding the JSON Response
128143

129-
The script returns a JSON envelope wrapping the raw API result:
144+
The output JSON uses an envelope wrapping the raw API result:
130145

131146
```json
132147
{
@@ -143,6 +158,8 @@ The script returns a JSON envelope wrapping the raw API result:
143158
- `result[n].prunedResult` - structured parsing output for each page (layout/content/confidence and related metadata)
144159
- `result[n].markdown` — full rendered page output in markdown/HTML
145160

161+
> Raw result location (default): the temp-file path printed by the script on stderr
162+
146163
### Usage Examples
147164

148165
**Example 1: Extract Full Document Text**
@@ -174,6 +191,14 @@ python scripts/vl_caller.py \
174191
--pretty
175192
```
176193

194+
**Example 4: Print JSON Without Saving**
195+
```bash
196+
python scripts/vl_caller.py \
197+
--file-url "URL" \
198+
--stdout \
199+
--pretty
200+
```
201+
177202
Then return:
178203
- Full `text` when user asks for full document content
179204
- `result[n].prunedResult` and `result[n].markdown` when user needs complete structured page data

skills/paddleocr-doc-parsing/references/output_schema.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
This document defines the output envelope returned by `vl_caller.py`.
44

5+
By default, `vl_caller.py` saves the JSON envelope to a unique file under the system temp directory and prints the absolute saved path to `stderr`. Use `--output` when you need a custom destination, or `--stdout` when you want to skip file saving and print JSON directly.
6+
57
## Output Envelope
68

79
`vl_caller.py` wraps provider response in a stable structure:
@@ -84,12 +86,15 @@ Raw fields may vary by model version and endpoint.
8486
## Command Examples
8587

8688
```bash
87-
# Parse document from URL
89+
# Parse document from URL (result auto-saves to the system temp directory)
8890
python scripts/paddleocr-doc-parsing/vl_caller.py --file-url "URL" --pretty
8991

90-
# Parse local file
92+
# Parse local file (result auto-saves to the system temp directory)
9193
python scripts/paddleocr-doc-parsing/vl_caller.py --file-path "doc.pdf" --pretty
9294

93-
# Save result to file
94-
python scripts/paddleocr-doc-parsing/vl_caller.py --file-url "URL" --output result.json
95+
# Save result to a custom file path
96+
python scripts/paddleocr-doc-parsing/vl_caller.py --file-url "URL" --output "./result.json" --pretty
97+
98+
# Print JSON to stdout without saving a file
99+
python scripts/paddleocr-doc-parsing/vl_caller.py --file-url "URL" --stdout --pretty
95100
```

skills/paddleocr-doc-parsing/scripts/smoke_test.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,9 @@ def main():
160160
print(
161161
' python skills/paddleocr-doc-parsing/scripts/vl_caller.py --file-path "doc.pdf"'
162162
)
163+
print(
164+
" Results are auto-saved to the system temp directory; the caller prints the saved path."
165+
)
163166

164167
return 0
165168

skills/paddleocr-doc-parsing/scripts/vl_caller.py

Lines changed: 46 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@
2828
import io
2929
import json
3030
import sys
31+
import tempfile
32+
import uuid
33+
from datetime import datetime
3134
from pathlib import Path
3235

3336
# Fix Windows console encoding
@@ -41,21 +44,42 @@
4144
from lib import parse_document
4245

4346

47+
def get_default_output_path():
48+
"""Build a unique result path under the OS temp directory."""
49+
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
50+
short_id = uuid.uuid4().hex[:8]
51+
return (
52+
Path(tempfile.gettempdir())
53+
/ "paddleocr"
54+
/ "doc-parsing"
55+
/ "results"
56+
/ f"result_{timestamp}_{short_id}.json"
57+
)
58+
59+
60+
def resolve_output_path(output_arg):
61+
if output_arg:
62+
return Path(output_arg).expanduser().resolve()
63+
return get_default_output_path().resolve()
64+
65+
4466
def main():
4567
parser = argparse.ArgumentParser(
4668
description="PaddleOCR Document Parsing - with layout analysis",
4769
formatter_class=argparse.RawDescriptionHelpFormatter,
4870
epilog="""
4971
Examples:
50-
# Parse document from URL
72+
# Parse document from URL (result is auto-saved to the system temp directory)
5173
python scripts/paddleocr-doc-parsing/vl_caller.py --file-url "https://example.com/document.pdf"
5274
53-
# Parse local file
75+
# Parse local file (result is auto-saved to the system temp directory)
5476
python scripts/paddleocr-doc-parsing/vl_caller.py --file-path "./invoice.pdf"
5577
56-
# Save result to file
57-
python scripts/paddleocr-doc-parsing/vl_caller.py --file-url "URL" --output result.json --pretty
78+
# Save result to a custom file path
79+
python scripts/paddleocr-doc-parsing/vl_caller.py --file-url "URL" --output "./result.json" --pretty
5880
81+
# Print JSON to stdout without saving a file
82+
python scripts/paddleocr-doc-parsing/vl_caller.py --file-url "URL" --stdout --pretty
5983
Configuration:
6084
Run: python scripts/paddleocr-doc-parsing/configure.py
6185
Or set in .env: PADDLEOCR_DOC_PARSING_API_URL, PADDLEOCR_ACCESS_TOKEN
@@ -79,8 +103,17 @@ def main():
79103
parser.add_argument(
80104
"--pretty", action="store_true", help="Pretty-print JSON output"
81105
)
82-
parser.add_argument(
83-
"--output", "-o", metavar="FILE", help="Save result to JSON file"
106+
output_group = parser.add_mutually_exclusive_group()
107+
output_group.add_argument(
108+
"--output",
109+
"-o",
110+
metavar="FILE",
111+
help="Save result to JSON file (default: auto-save to system temp directory)",
112+
)
113+
output_group.add_argument(
114+
"--stdout",
115+
action="store_true",
116+
help="Print JSON to stdout instead of saving to a file",
84117
)
85118

86119
args = parser.parse_args()
@@ -99,20 +132,19 @@ def main():
99132
indent = 2 if args.pretty else None
100133
json_output = json.dumps(result, indent=indent, ensure_ascii=False)
101134

102-
# Save to file or print
103-
if args.output:
135+
if args.stdout:
136+
print(json_output)
137+
else:
138+
output_path = resolve_output_path(args.output)
139+
140+
# Save to file
104141
try:
105-
output_path = Path(args.output).resolve()
106142
output_path.parent.mkdir(parents=True, exist_ok=True)
107143
output_path.write_text(json_output, encoding="utf-8")
108144
print(f"Result saved to: {output_path}", file=sys.stderr)
109145
except (PermissionError, OSError) as e:
110-
print(f"Error: Cannot write to {args.output}: {e}", file=sys.stderr)
146+
print(f"Error: Cannot write to {output_path}: {e}", file=sys.stderr)
111147
sys.exit(5)
112-
else:
113-
print(json_output)
114-
if result["ok"]:
115-
print("\nTip: Use --output result.json to save the result", file=sys.stderr)
116148

117149
# Exit code based on result
118150
sys.exit(0 if result["ok"] else 1)

0 commit comments

Comments
 (0)