You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project provides a powerful and flexible PDF analysis microservice built with **Clean Architecture** principles. The service enables OCR, segmentation, and classification of different parts of PDF pages, identifying elements such as texts, titles, pictures, tables, formulas, and more. Additionally, it determines the correct reading order of these identified elements and can convert PDFs to various formats including Markdown and HTML with **automatic translation support** powered by Ollama.
28
28
29
-
### ✨ Key Features
29
+
The service offers both a **user-friendly Gradio web interface** for interactive use and a **comprehensive REST API** for programmatic access and integration.
30
30
31
-
- 🔍 **Advanced PDF Layout Analysis** - Segment and classify PDF content with high accuracy
32
-
- 🖼️ **Visual & Fast Models** - Choose between VGT (Vision Grid Transformer) for accuracy or LightGBM for speed
33
-
- 📝 **Multi-format Output** - Export to JSON, Markdown, HTML, and visualize PDF segmentations
34
-
- 🌍 **Automatic Translation** - Translate documents to multiple languages using Ollama models
35
-
- 🌐 **OCR Support** - 150+ language support with Tesseract OCR
36
-
- 📊 **Table & Formula Extraction** - Extract tables as HTML and formulas as LaTeX
37
-
- 🏗️ **Clean Architecture** - Modular, testable, and maintainable codebase
38
-
- 🐳 **Docker-Ready** - Easy deployment with GPU support
39
-
- ⚡ **RESTful API** - Comprehensive API with 10+ endpoints
<imgsrc="https://raw.githubusercontent.com/huridocs/pdf-document-layout-analysis/main/images/ui.png"alt="Gradio Web UI"width="800"/>
33
+
<p><em>Gradio Web Interface - Easy-to-use UI for PDF analysis, conversion, and translation</em></p>
34
+
</div>
63
35
64
36
---
65
37
66
38
## 🚀 Quick Start
67
39
68
40
### 1. Start the Service
69
41
70
-
**Standard PDF Analysis (recommended for most users):**
71
-
```bash
72
-
make start
73
-
```
74
-
75
-
**With Translation Features (includes Ollama container):**
76
42
```bash
77
-
make start_translation
43
+
just start
78
44
```
79
45
80
-
The service will be available at `http://localhost:5060`
46
+
The service provides two interfaces:
47
+
-**🎨 Web UI (Gradio)**: `http://localhost:7860` - User-friendly interface for all features
48
+
-**🔌 REST API**: `http://localhost:5060` - Programmatic access for integrations
81
49
82
50
**See all available commands:**
83
51
```bash
84
-
make help
52
+
just --list
85
53
```
86
54
87
55
**Check service status:**
@@ -90,7 +58,18 @@ make help
90
58
curl http://localhost:5060/info
91
59
```
92
60
93
-
### 2. Basic PDF Analysis
61
+
### 2. Using the Web UI
62
+
63
+
Simply open your browser and navigate to `http://localhost:7860` to access the intuitive web interface. The UI provides:
64
+
65
+
- 📄 **PDF Analysis** - Upload and analyze PDFs with visual results
66
+
- 🔄 **Format Conversion** - Convert to Markdown or HTML
67
+
- 🌍 **Translation** - Translate documents to multiple languages
68
+
- 👁️ **Visualization** - View segmentation overlays on your PDFs
69
+
- 🔍 **OCR Processing** - Apply OCR to scanned documents
70
+
- 📑 **TOC Extraction** - Extract table of contents
71
+
72
+
### 3. Using the REST API
94
73
95
74
**Analyze a PDF document (VGT model - high accuracy):**
96
75
```bash
@@ -102,18 +81,61 @@ curl -X POST -F 'file=@/path/to/your/document.pdf' http://localhost:5060
102
81
curl -X POST -F 'file=@/path/to/your/document.pdf' -F "fast=true" http://localhost:5060
103
82
```
104
83
105
-
### 3. Stop the Service
84
+
### 4. Stop the Service
106
85
107
86
```bash
108
-
make stop
87
+
just stop
109
88
```
110
89
111
-
> 💡 **Tip**: Replace `/path/to/your/document.pdf` with the actual path to your PDF file. The service will return a JSON response with segmented content and metadata.
90
+
> 💡 **Tip**: The Web UI at `http://localhost:7860` is the easiest way to get started. For automation and integration, use the REST API at `http://localhost:5060`.
112
91
92
+
---
93
+
94
+
## ✨ Key Features
95
+
96
+
- 🎨 **User-Friendly Web UI** - Intuitive Gradio interface for easy PDF processing
97
+
- 🔍 **Advanced PDF Layout Analysis** - Segment and classify PDF content with high accuracy
98
+
- 🖼️ **Visual & Fast Models** - Choose between VGT (Vision Grid Transformer) for accuracy or LightGBM for speed
99
+
- 📝 **Multi-format Output** - Export to JSON, Markdown, HTML, and visualize PDF segmentations
100
+
- 🌍 **Automatic Translation** - Translate documents to multiple languages using Ollama models
101
+
- 🌐 **OCR Support** - 150+ language support with Tesseract OCR
102
+
- 📊 **Table & Formula Extraction** - Extract tables as HTML and formulas as LaTeX
103
+
- 🏗️ **Clean Architecture** - Modular, testable, and maintainable codebase
104
+
- 🐳 **Docker-Ready** - Easy deployment with GPU support
105
+
- ⚡ **RESTful API** - Comprehensive API with 10+ endpoints
0 commit comments