I am a Data Engineer and Machine Learning practitioner passionate about building scalable data pipelines, optimizing analytics solutions, and deploying AI models. My expertise spans cloud computing (AWS, GCP, Snowflake), big data processing, and deep learning, enabling businesses to make data-driven decisions and enhance operational efficiency.
π Master of Science in Data Science β Indiana University
πΌ 2+ years of experience in Data Engineering, AI/ML, and Analytics
π Enthusiastic and practitioner in Generative AI, Large Language Models (LLMs), and Deep Learning
- Data Engineer β Logan Data Inc: Built an end-to-end data pipeline in Snowflake to optimize a franchise restaurantβs dynamic menu decisions. Integrated weather and sales data from S3, enhancing inventory planning and staffing, and reduced costs by $900/month with dynamic warehouse scaling.
- AI Data Engineer β CrowdDoing: Developed scalable ETL pipelines with AWS S3, Airflow, and Docker, improving data retrieval efficiency with a Neo4j knowledge graph. Integrated LLMs using DSPy and LangChain, boosting context accuracy by 40%, and built a RAG system with OpenAI APIs for advanced Q&A and contextual search.
- Data Analytics Engineer β Indiana University: Analyzed 20 years of ACPSA data with PostgreSQL, revealing the arts sectorβs 4.5% GDP contribution. Automated ETL workflows (SSIS) and developed interactive dashboards (Power BI, D3.js) on ArtsAnalytics.org, providing real-time insights across 35 subsectors.
- Associate Data Engineer β Accenture: Optimized HR & Payroll pipelines, designing an AWS Redshift model that cut costs by 25% and streamlined Oracle data migration. Enhanced data quality by 50% through automation and built RBAC-enabled Tableau dashboards for recruitment analytics.
- Programming & Database Systems: Python, SQL, R, MySQL, PostgreSQL, Amazon RDS, MongoDB, DynamoDB, Cassandra
- Big Data Technologies: Spark, PySpark, Kafka, Amazon Kinesis, Athena, EMR, Hadoop, Apache NiFi
- Data Engineering & Cloud Services: Data Warehousing, ETL, Data Modeling, Snowflake, SSIS, SSRS, AWS Lambda, Glue, Redshift
- CI/CD & Infrastructure Automation: GitHub, Jenkins, AWS CodeBuild, Docker, Kubernetes, Terraform, DBT, Airflow
- Visualization Tools: Tableau, Power BI, Streamlit, D3.js, Looker, AWS QuickSight, Matplotlib, Seaborn, Plotly
- Machine Learning: NumPy, Pandas, Scikit-learn, PyTorch, TensorFlow, Statistical Modeling, Amazon SageMaker
Reddit ETL Pipeline β Built a scalable ETL pipeline using Apache Airflow, Docker, AWS (S3, Redshift), and DBT, improving processing efficiency by 25%.
Airline Data Ingestion Pipeline β Designed an event-driven data pipeline using AWS S3, Glue, Redshift, and Step Functions to automate flight data processing.
Food Delivery Analysis (Real-Time) β Developed a real-time data pipeline with Kinesis, PySpark, AWS EMR, and QuickSight dashboards for live monitoring of delivery operations.
Amazon Mobile Sales ETL β Created an ETL pipeline in Snowflake using Snowpark, implementing dimensional modeling to optimize analytical queries.
PySpark Sales Data Analysis β Performed ETL and exploratory data analysis using PySpark, uncovering customer spending patterns and sales trends.
Weather Data Analysis β Automated weather data ingestion and processing with AWS Glue, Redshift, and MWAA, integrating CI/CD pipelines for seamless deployment.
Apple Data Analysis β Developed ETL workflows on Databricks to analyze customer transactions and purchasing patterns, enhancing business intelligence strategies.
Blood Cell Cancer Prediction (Databricks) β Implementing a CNN-based model for classification and segmentation of blood cell images to aid in early cancer detection.
Chessman Classification (AWS) β Deployed a CNN model for chess piece recognition using AWS Lambda, ECR, and API Gateway for real-time inference.
Fraud Detection β Built a fraudulent transaction detection model, leveraging feature engineering, model training, and evaluation metrics to detect anomalies.
GPT Model from Scratch β Implemented a GPT model inspired by LLM architectures, covering attention mechanisms, transformers, and fine-tuning for NLP tasks.
Feel free to reach out or connect with me on LinkedIn or through email at [email protected].
Thanks for visiting my GitHub profile! π