Skip to content

Scalable, fault-tolerant document processing pipeline built on AWS (S3, SQS, Rekognition) and Java, demonstrating concurrent job handling and Dockerized deployment.

Notifications You must be signed in to change notification settings

sriharshitaA/AWS-Based-Distributed-Image-Processing-Pipeline-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AWS-Based-Distributed-Image-Processing-Pipeline-

This project is a cloud-native, distributed system designed to automate the ingestion and analysis of high-volume image and document uploads. The primary goal was to create a pipeline that is both resilient to failure and highly performant.

Core Architecture: The system uses AWS S3 as the ingest trigger, which places asynchronous jobs onto an AWS SQS queue. A Dockerized Java application acts as the consumer service, processing messages concurrently. This service leverages AWS AI services (Rekognition and Textract) to perform complex data extraction and classification before storing the final, enriched data.

Key Technical Highlights:

Concurrency Model: Utilized a message-driven consumer architecture in Java to manage parallelism and ensure that no processing job is lost due to transient errors.

Performance: The asynchronous queuing model successfully decoupled ingestion from processing, leading to a 20% improvement in end-to-end latency for document analysis.

DevOps: The entire processor service is containerized using Docker for consistent deployment across different AWS environments.

About

Scalable, fault-tolerant document processing pipeline built on AWS (S3, SQS, Rekognition) and Java, demonstrating concurrent job handling and Dockerized deployment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages