Skip to content
View Naga-Manohar-Y's full-sized avatar

Block or report Naga-Manohar-Y

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Naga-Manohar-Y/README.md

πŸ‘‹ Hi there, I'm Manohar!

I am a Data Engineer and Machine Learning practitioner passionate about building scalable data pipelines, optimizing analytics solutions, and deploying AI models. My expertise spans cloud computing (AWS, GCP, Snowflake), big data processing, and deep learning, enabling businesses to make data-driven decisions and enhance operational efficiency.

🌱 About Me

πŸŽ“ Master of Science in Data Science – Indiana University

πŸ’Ό 2+ years of experience in Data Engineering, AI/ML, and Analytics

🌟 Enthusiastic and practitioner in Generative AI, Large Language Models (LLMs), and Deep Learning

πŸ’Ό Professional Experience

  • Data Engineer – Logan Data Inc: Built an end-to-end data pipeline in Snowflake to optimize a franchise restaurant’s dynamic menu decisions. Integrated weather and sales data from S3, enhancing inventory planning and staffing, and reduced costs by $900/month with dynamic warehouse scaling.
  • AI Data Engineer – CrowdDoing: Developed scalable ETL pipelines with AWS S3, Airflow, and Docker, improving data retrieval efficiency with a Neo4j knowledge graph. Integrated LLMs using DSPy and LangChain, boosting context accuracy by 40%, and built a RAG system with OpenAI APIs for advanced Q&A and contextual search.
  • Data Analytics Engineer – Indiana University: Analyzed 20 years of ACPSA data with PostgreSQL, revealing the arts sector’s 4.5% GDP contribution. Automated ETL workflows (SSIS) and developed interactive dashboards (Power BI, D3.js) on ArtsAnalytics.org, providing real-time insights across 35 subsectors.
  • Associate Data Engineer – Accenture: Optimized HR & Payroll pipelines, designing an AWS Redshift model that cut costs by 25% and streamlined Oracle data migration. Enhanced data quality by 50% through automation and built RBAC-enabled Tableau dashboards for recruitment analytics.

πŸ”§ Skills

  • Programming & Database Systems: Python, SQL, R, MySQL, PostgreSQL, Amazon RDS, MongoDB, DynamoDB, Cassandra
  • Big Data Technologies: Spark, PySpark, Kafka, Amazon Kinesis, Athena, EMR, Hadoop, Apache NiFi
  • Data Engineering & Cloud Services: Data Warehousing, ETL, Data Modeling, Snowflake, SSIS, SSRS, AWS Lambda, Glue, Redshift
  • CI/CD & Infrastructure Automation: GitHub, Jenkins, AWS CodeBuild, Docker, Kubernetes, Terraform, DBT, Airflow
  • Visualization Tools: Tableau, Power BI, Streamlit, D3.js, Looker, AWS QuickSight, Matplotlib, Seaborn, Plotly
  • Machine Learning: NumPy, Pandas, Scikit-learn, PyTorch, TensorFlow, Statistical Modeling, Amazon SageMaker

πŸ“‚ Projects

πŸš€ Data Engineering Projects

Reddit ETL Pipeline – Built a scalable ETL pipeline using Apache Airflow, Docker, AWS (S3, Redshift), and DBT, improving processing efficiency by 25%.

Airline Data Ingestion Pipeline – Designed an event-driven data pipeline using AWS S3, Glue, Redshift, and Step Functions to automate flight data processing.

Food Delivery Analysis (Real-Time) – Developed a real-time data pipeline with Kinesis, PySpark, AWS EMR, and QuickSight dashboards for live monitoring of delivery operations.

Amazon Mobile Sales ETL – Created an ETL pipeline in Snowflake using Snowpark, implementing dimensional modeling to optimize analytical queries.

PySpark Sales Data Analysis – Performed ETL and exploratory data analysis using PySpark, uncovering customer spending patterns and sales trends.

Weather Data Analysis – Automated weather data ingestion and processing with AWS Glue, Redshift, and MWAA, integrating CI/CD pipelines for seamless deployment.

Apple Data Analysis – Developed ETL workflows on Databricks to analyze customer transactions and purchasing patterns, enhancing business intelligence strategies.

πŸ€– Machine Learning Projects

Blood Cell Cancer Prediction (Databricks) – Implementing a CNN-based model for classification and segmentation of blood cell images to aid in early cancer detection.

Chessman Classification (AWS) – Deployed a CNN model for chess piece recognition using AWS Lambda, ECR, and API Gateway for real-time inference.

Fraud Detection – Built a fraudulent transaction detection model, leveraging feature engineering, model training, and evaluation metrics to detect anomalies.

GPT Model from Scratch – Implemented a GPT model inspired by LLM architectures, covering attention mechanisms, transformers, and fine-tuning for NLP tasks.

πŸ“« Connect with Me

Feel free to reach out or connect with me on LinkedIn or through email at [email protected].

Thanks for visiting my GitHub profile! 🌟

Pinned Loading

  1. Airline_Data_Ingestion Airline_Data_Ingestion Public

    An end-to-end event driven data pipeline for airline data, utilizing various AWS services to process and store flight data in redshift with efficient data model..

    Python 1

  2. Reddit-Pipeline-DE Reddit-Pipeline-DE Public

    Explore the world of Data Engineering through a sophisticated ETL pipeline leveraging Reddit's API, AWS S3, Redshift, dbt transformations, and Airflow orchestration in Docker. Visualize insights on…

    Python

  3. Food-Delivery-Analysis-in-Real-Time Food-Delivery-Analysis-in-Real-Time Public

    This project builds a real-time food delivery analytics pipeline using AWS Kinesis, PySpark, Redshift, and QuickSight, with automated deployments via CodeBuild.

    Python

  4. CRNY_Survey_Data_Analysis CRNY_Survey_Data_Analysis Public

    Explore how guaranteed income impacts New York's diverse artist community, revealing insights into financial resilience and societal support through interactive Power-BI dashboard.

    Jupyter Notebook

  5. LLM_From_Scratch LLM_From_Scratch Public

    This project is an implementation of a GPT model built entirely from scratch, inspired by LLM from Scratch by Sebastian. The goal is to understand the inner workings of transformers, attention mech…

    Jupyter Notebook 1

  6. Classifying_Chessmen Classifying_Chessmen Public

    A deep learning project that classifies chess pieces from images using CNN models. The pipeline includes image preprocessing, data augmentation, model training (with TensorFlow/Keras), and evaluati…

    Jupyter Notebook 1