Skip to content

opusaha/docx2html-service

Repository files navigation

DOCX to HTML Converter Microservice

A Flask-based microservice that converts Word documents (.docx, .doc) to HTML format. Perfect for integration with React applications or any other frontend framework.

Features

  • ✅ Convert .docx and .doc files to HTML
  • ✅ Handle headings, paragraphs, and tables
  • ✅ Automatic file cleanup after conversion
  • ✅ CORS enabled for frontend integration
  • ✅ File size validation (16MB max)
  • ✅ Error handling and validation
  • ✅ Clean, responsive HTML output with CSS styling
  • ✅ Beautiful web interface with drag & drop
  • ✅ Real-time conversion with progress indicators
  • ✅ Copy to clipboard functionality
  • ✅ Conversion statistics and metrics
  • NEW: RESTful API endpoints for integration
  • NEW: No authentication required
  • NEW: Support for both file upload and base64 data
  • NEW: Comprehensive API documentation
  • NEW: Test client and examples

Installation

  1. Clone or navigate to the project directory:

    cd docx2html-service
  2. Activate virtual environment:

    # Windows
    venv\Scripts\activate
    
    # Linux/Mac
    source venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt

Usage

Start the Service

python main.py

The service will start on http://localhost:8000

Access the Home Page

Once the service is running, open your browser and go to:

http://localhost:8000/

This will show you a beautiful web interface where you can:

  • Upload Word documents (.docx, .doc)
  • Convert them to HTML instantly
  • View the converted HTML
  • Copy the HTML to clipboard
  • See conversion statistics

API Endpoints

1. Web Interface

GET /

Access the beautiful web interface for file upload and conversion.

2. Convert Document (Web)

POST /convert
Content-Type: multipart/form-data

Request Body: file: Word document file (.docx or .doc)

3. API Status Check

GET /api/status

Check API status and get service information.

4. Convert Document (API)

POST /api/convert

Two methods supported:

Method 1: Multipart Form Data

Content-Type: multipart/form-data
Body: file field with document

Method 2: Base64 JSON

Content-Type: application/json
Body: {"file_data": "base64_string", "filename": "document.docx"}

Response:

{
  "success": true,
  "html": "<!DOCTYPE html>...",
  "filename": "document.docx",
  "message": "Document converted successfully",
  "conversion_time": 1.23,
  "file_size_bytes": 1024000,
  "file_size_mb": 1.0
}

API Usage

No Authentication Required

The API endpoints are designed to be easily integrated into other applications without requiring authentication tokens.

Testing the API

  1. Python Test Client: Use python test_api.py <document.docx> to test the API
  2. Web Test Interface: Open test_api.html in your browser to test interactively
  3. cURL Examples: See the comprehensive examples in API_DOCUMENTATION.md

Integration Examples

See API_DOCUMENTATION.md for complete examples in:

  • Python
  • Node.js/JavaScript
  • PHP
  • cURL

React Integration Example

import React, { useState } from 'react';

function DocumentConverter() {
  const [file, setFile] = useState(null);
  const [html, setHtml] = useState('');
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState('');

  const handleFileChange = (e) => {
    setFile(e.target.files[0]);
    setError('');
  };

  const convertDocument = async () => {
    if (!file) {
      setError('Please select a file');
      return;
    }

    setLoading(true);
    setError('');

    const formData = new FormData();
    formData.append('file', file);

    try {
      const response = await fetch('http://localhost:5000/convert', {
        method: 'POST',
        body: formData,
      });

      const data = await response.json();

      if (data.success) {
        setHtml(data.html);
      } else {
        setError(data.error || 'Conversion failed');
      }
    } catch (err) {
      setError('Network error: ' + err.message);
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <h1>Word to HTML Converter</h1>
      
      <input 
        type="file" 
        accept=".docx,.doc" 
        onChange={handleFileChange} 
      />
      
      <button onClick={convertDocument} disabled={loading}>
        {loading ? 'Converting...' : 'Convert to HTML'}
      </button>

      {error && <p style={{color: 'red'}}>{error}</p>}

      {html && (
        <div>
          <h2>Converted HTML:</h2>
          <div dangerouslySetInnerHTML={{ __html: html }} />
        </div>
      )}
    </div>
  );
}

export default DocumentConverter;

Error Handling

The service handles various error scenarios:

  • 400: Invalid file type, no file uploaded
  • 413: File too large (over 16MB)
  • 500: Internal server error during conversion
  • 404: Endpoint not found

File Support

  • Supported formats: .docx, .doc
  • Maximum file size: 16MB
  • Output: Clean HTML with embedded CSS styling

Security Features

  • File extension validation
  • Secure filename handling
  • Automatic file cleanup after processing
  • CORS configuration for controlled access

Development

To run in development mode with auto-reload:

python main.py

The service will automatically reload when you make changes to the code.

Production Deployment

For production deployment, consider:

  1. Using a production WSGI server like Gunicorn
  2. Setting up proper logging
  3. Implementing rate limiting
  4. Adding authentication if needed
  5. Using environment variables for configuration

Troubleshooting

Common Issues

  1. Import errors: Make sure all dependencies are installed
  2. Port conflicts: Change the port in main.py if 5000 is busy
  3. File permissions: Ensure the uploads directory is writable
  4. CORS issues: Check if your React app is running on the correct port

Debug Mode

The service runs in debug mode by default. For production, set debug=False in the app.run() call.

License

This project is open source and available under the MIT License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published