Skip to content
View sherozshaikh's full-sized avatar

Block or report sherozshaikh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sherozshaikh/README.md

Hey there, I'm Sheroz πŸ‘‹

Machine Learning Engineer & Data Scientist
Building production ML systems β€” from LLM-powered automation to healthcare AI

LinkedIn Email GitHub


πŸš€ About Me

  • πŸŽ“ M.S. Data Science @ Worcester Polytechnic Institute (WPI) | GPA: 3.9/4.0
  • πŸ† Best Data Science Project award winner (1st place out of 20+ teams) β€” healthcare ML project
  • πŸ₯ 5+ years building production ML systems across healthcare, fintech, and IoT
  • πŸ€– Passionate about LLMs, semantic search, and ML pipeline automation
  • πŸ“¦ Open-source contributor β€” published 4 Python packages on PyPI
  • πŸ“ Boston, MA

πŸ—οΈ What I've Built

  • LLM-Powered Ticket Routing β€” Claude API-based system automating 40% of classification workflows, saving ~$700/month in operational costs
  • ICD-10 Medical Coding System β€” Production LLM serving 10+ enterprise healthcare clients, processing 100K+ monthly requests
  • Semantic Search Platform β€” Vector embeddings over 940K healthcare documents, delivering ~$80K/month in operational savings
  • ML Document Classifier β€” Production classifier automating 80% of daily document triage (900+ docs) with 99%+ uptime
  • Time-Series Forecasting β€” PyTorch pipeline predicting equipment failures 30 days in advance
  • LoRA Fine-Tuning Pipeline β€” End-to-end text classification with parameter-efficient fine-tuning and reproducible benchmarking

🧰 Tech Stack

AI & ML Frameworks

PyTorch Scikit--learn HuggingFace LangChain XGBoost

LLMs & Vector Search

Claude OpenAI FAISS Pinecone Chroma

Data Engineering & ETL

PySpark Airflow Polars SQL

Production & MLOps

FastAPI Docker AWS MLflow GitHub Actions Prometheus

Languages

Python SQL Linux


πŸ“ˆ Highlights

  • πŸ₯ Deployed production LLM for ICD-10 medical coding serving 10+ enterprise healthcare clients
  • πŸ” Built semantic search over 940K documents, saving ~$80K/month in operational costs
  • ⚑ Automated 80% of daily document triage with ML classifier (900+ docs/day)
  • πŸ“Š Optimized PySpark ETL for 15M+ Medicare records β€” 75% fewer data scans, 58% faster queries
  • πŸ“¦ Published 4 open-source Python packages on PyPI for ML pipeline tooling
  • πŸ† 1st place β€” WPI Best Data Science Project (Winter 2024)

GitHub Streak

Profile Views


πŸ’¬ Let's connect β€” always happy to chat about ML engineering, LLMs, healthcare AI, or open-source!

Pinned Loading

  1. mini-rainbow-dqn mini-rainbow-dqn Public

    End-to-end deep reinforcement learning platform for Atari Breakout. Implements DQN, Double DQN, Dueling Networks, and Prioritized Experience Replay with side-by-side agent comparison, live inferenc…

    Python

  2. agentic-rag-eval agentic-rag-eval Public

    RAG system with adaptive retrieval (Qdrant dense + sparse + RRF), cross-encoder re-ranking, and optional long-term memory (Mem0) β€” evaluated using HotpotQA EM/F1 on 5K questions. Dual-backend: runs…

    Python

  3. pageclassifier pageclassifier Public

    Gemini-powered page classifier that decides whether a document page image contains invoice line items. Designed to sit between paperflight and any downstream extractor to reduce expensive LLM calls…

    Python

  4. predictive-maintenance-platform predictive-maintenance-platform Public

    End-to-end ML platform for turbofan engine RUL forecasting, failure classification, and anomaly detection using NASA CMAPSS FD001 dataset

    Python

  5. retail-demand-allocator retail-demand-allocator Public

    End-to-end ML platform for retail demand forecasting and marketing budget optimization using UCI Online Retail dataset

    Jupyter Notebook

  6. spam-email-classification-lora spam-email-classification-lora Public

    Spam Email Classification using LoRA Fine-tuned Transformers: High-performance spam email classification using LoRA-adapted transformer models (ELECTRA, RoBERTa). Achieves 99.4%+ accuracy with para…

    Jupyter Notebook