An open-source file conversion webapp built with NextJs, Python
and AWS for the HTTP API, Lambda functions and S3 object storage.
Converts .docx files to .pdf
Features · Running locally · Overview · API Routing · Authors
-
Website
- NextJs App Router
- Amazon Web Services for backend functionality
- Support for
HTTP API,S3File Storage, andLambdafunctions - Edge runtime-ready
-
AWS Infrastructure
- Amazon S3 Allows for object storage and static site hosting
- API Gateway hosts the HTTP API
- AWS Lambda for processing JSON and filtering required data
- Amazon EC2 for provisioning VM instances
-
A static site is hosted on
S3with a document upload form. We useAPI Gatewayto create an API which makes aGETrequest to aLambdafunction after the user clicks Upload File on the form. -
The API sends a
presigned bucket URLfor theuploads-bucket. The site then automatically conducts aPUTrequest to the same bucket with the.docxfile data. -
Another
Lambdafunction is configured to listen forPUT Object eventsin the S3uploads-bucket. It parses the event record for file name and sends aPOSTrequest to the PythonFlask Appperforming the document conversion. -
An
EC2instance is deployed with an Ubuntu OS image. A python script is setup to run as a background process. -
The python microservice converts documents using
pandocpackage and is exposed as an API usingFlasklistening forPOSTrequests on a specified port. -
It downloads and saves the specified file with its ID, uploads the converted file to the
output-bucketonS3. The static site returns the download link for the converted file from theoutput-bucket.
The frontend of the app is hosted as a Static site in a separate S3 bucket.
Note
To learn more about the S3 static site and how to deploy it, visit the frontend/README.md
The HTTP API is hosted on AWS using API Gateway and Lambda function which deploys a getPresignedURL.js app. Source code for lambda function is in the lambda/presignedURL.js
Note
To learn more about the getPresignedURL.js app and how to deploy it, visit the lambda/README.md
-
Create a
EC2 t2.microinstance with anUbuntu Linux AMIand note the VM's public IPv4 address. -
Assign an IAM role to the EC2 instance with the
AmazonS3FullAccesspolicy attached. -
Run the Flask development server within the VM:
Before installing ensure its the correct Python version via python -V
sudo apt update && apt upgrade
sudo apt install pandoc texlive python3.10-venvpython3 -m venv venv
source venv/bin/activate
pip install pypandoc boto3 flask
mkdir inputs outputs
touch app.pyCopy the contents of app.py within the python file by opening it with any code editor (nano, vim etc).
sudo su
nohup python3 app.py > log.txt 2>&1 &- The Flask app should now be able to handle requests 24/7. It is being run as a background process using the
nohupcommand to ensure application uptime as long as VM is running even if we were to exit out of remote shell. - The logs and stdout along with stderr is saved to
log.txtin the same directory. - The
&displays the process ID for the python process which may be recorded to performkill <PID>in case the process is to be stopped.
The Flask app should now be running on: http://{ec2-instance-public-ipv4-address}:5000
Replace this address in the API endpoint URL within the trigger_converter.py Lambda function to send the S3 .docx files to the Flask microservice to be converted.
Warning
This command only starts the webapp. You will need to configure the instance Security Group to allow TCP connections to port 5000 of the EC2 instance from any external IPv4 address [0.0.0.0/0] on AWS to get the full functionality.
Note
Follow the above steps for the PNG and CSV converter microservices in similar fashion in separate directories and expose them on different ports.
Tip
In case webapp demo videos aren't loading below in the README, please visit Youtube.
site.mp4
DOCX to PDF Conversion
image.mp4
PNG to PDF Conversion
S3 uploads-bucket for .docx files
S3 output-bucket for .pdf files
Flask App process running in EC2
This project is created by MLSA KIIT for Cloud Computing Domain's Project Wing:
- Sourasish Basu (@SourasishBasu) - MLSA KIIT
| Version | Date | Comments |
|---|---|---|
| 1.0 | Jan 24th, 2024 | Initial release |
Website/API
- File Validation and Sanitization on server side
- Better PDF conversion engine to retain original formatting in higher quality
- Better Error Handling
AWS Infrastructure
- Actual implementation in production
- Conversion feature between multiple file types
- Implementing image compression using methods such as Huffman Encoding

