Welcome to my Data Science Portfolio — a collection of hands-on projects, end-to-end Machine Learning systems, business analytics case studies, EDA projects, statistical analysis projects, dashboards, and recruiter-level practical work built to solve real-world problems using data.
This repository reflects my journey from learning concepts to building industry-ready solutions in:
- Data Analytics
- Data Science
- Machine Learning
- Business Intelligence
- Statistical Analysis
- Dashboard Development
- Healthcare Analytics
- Retail Intelligence
- Predictive Modeling
- End-to-End Deployment Projects
Hi, I'm Kailash Singh Rawat — an MCA (Data Science) student and an aspiring Data Analyst / Data Scientist passionate about solving real-world business problems using data.
I strongly believe:
Before building Machine Learning models, understanding the data deeply is the real skill that creates strong analysts.
My focus is not just on building models, but on solving business problems through:
- clean data
- strong analysis
- meaningful insights
- decision-support systems
- production-ready project thinking
This repository contains my Jupyter notebooks, analytics projects, machine learning systems, and portfolio work designed with recruiter expectations in mind.
- Python
- SQL
- MySQL
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Power BI
- Tableau
- Excel
- Scikit-learn
- Logistic Regression
- Multiple Linear Regression
- Ridge Regression
- Lasso Regression
- Random Forest
- Gradient Boosting
- K-Means Clustering
- Feature Engineering
- Model Evaluation
- Hyperparameter Tuning
- Pipeline & ColumnTransformer
- Statistics
- Exploratory Data Analysis (EDA)
- Hypothesis Testing
- Confidence Intervals
- Statistical Inference
- Time Series Forecasting
- Explainable AI (SHAP)
- NLP
- Deep Learning (Learning Phase)
- Business Analytics
- Healthcare Analytics
- Retail Intelligence
Data-Science-Portfolio/
│
├── Python/
│
├── Statistics & EDA/
│ ├── Sales & Discounts Analysis Project
│ ├── Hospital Patient Data Analysis Project
│ ├── Cardiotocographic EDA Project
│ ├── Hypothesis Testing - Bombay Hospitality Ltd
│ └── Estimation And Confidence Intervals
│
├── Machine Learning/
│ ├── Toyota Corolla Price Prediction - Multiple Linear Regression
│ └── Diabetes Prediction using Logistic Regression
│
├── NumPy & Pandas/
│
├── Resume Projects/
│ ├── Titanic Survival Prediction
│ ├── House Price Prediction
│ └── Retail Intelligence & Forecasting System
│
├── SQL Projects/
│
├── Power BI Dashboards/
│
└── README.md
- Mean, Median, Mode, Standard Deviation
- Histograms and Distribution Analysis
- Boxplots and Outlier Detection using IQR
- Categorical Analysis using Bar Charts
- Discount Behavior Analysis
- Revenue Concentration Analysis
- Business Interpretation of Sales Patterns
A small number of high-value transactions were generating a major portion of total revenue, while most transactions remained low to medium in value.
This project strengthened my understanding of how descriptive analytics directly impacts business decisions.
- Patient Data Cleaning and Preprocessing
- Missing Value Handling using Mean Imputation
- Duplicate Record Removal using PatientID
- Department-wise Revenue Analysis using GroupBy
- Merging Patient + Billing Datasets
- Row-wise and Column-wise Concatenation
- Billing Analytics and Doctor Performance Preparation
Accurate billing analytics depends heavily on duplicate removal, missing value handling, and proper dataset merging before any dashboard or machine learning model can be trusted.
This project strengthened my understanding of healthcare analytics and operational intelligence.
- Medical Dataset Cleaning and Preprocessing
- Missing Value Handling using Median Imputation
- Outlier Detection using IQR
- Histograms, Boxplots, and Violin Plots
- Correlation Heatmaps and Scatter Plots
- Pair Plot Analysis
- Fetal Health Monitoring Insights
A strong relationship between uterine contractions and late decelerations highlighted possible contraction-related fetal stress and high-risk pregnancy indicators.
This project strengthened my understanding of medical data interpretation and healthcare decision support.
- Hypothesis Testing
- Right-Tailed Z-Test
- Statistical Decision-Making
- Critical Value Comparison
- Business Model Validation
- Evidence-Based Operational Analysis
Although restaurant owners believed operating costs had increased, statistical testing showed that the observed average costs were actually significantly lower than the theoretical model prediction.
This project strengthened my understanding of statistical decision-making and business validation using hypothesis testing.
- Confidence Interval Estimation
- t-Distribution vs z-Distribution
- Statistical Inference
- Margin of Error Analysis
- Manufacturing Reliability Estimation
- Product Quality Analytics
The project demonstrated how uncertainty increases when population standard deviation is unknown, leading to wider t-confidence intervals compared to z-confidence intervals.
This project strengthened my understanding of estimation theory, uncertainty quantification, and industrial quality control analytics.
- Multiple Linear Regression (MLR)
- Exploratory Data Analysis (EDA)
- Data Preprocessing
- Feature Engineering
- Model Evaluation
- Ridge Regression
- Lasso Regression
- Regularization Techniques
- Multicollinearity Handling
- Predictive Analytics
Car age and kilometers driven showed strong negative relationships with price, while horsepower and vehicle weight positively influenced resale value.
The project demonstrated how regression models and regularization techniques improve pricing prediction and model stability in automotive analytics.
- Logistic Regression
- Binary Classification
- Healthcare Analytics
- Exploratory Data Analysis (EDA)
- Medical Data Preprocessing
- Feature Scaling
- ROC-AUC Analysis
- Explainable AI
- Streamlit Deployment
- Healthcare Risk Prediction
Glucose, BMI, Age, and DiabetesPedigreeFunction showed strong influence on diabetes risk prediction.
The project demonstrated how Machine Learning and healthcare analytics can support early disease detection and preventive healthcare systems.
- Complete ML Pipeline using Pipeline + ColumnTransformer
- Custom Feature Engineering using Transformer Class
- Logistic Regression Classification Model
- FastAPI Backend for Real-Time Prediction
- Streamlit Frontend UI
- Production-Ready Project Structure
This project demonstrated how machine learning moves beyond notebooks into real-world deployment using APIs, pipelines, and interactive applications.
~81%
- High-Dimensional Real Estate Dataset Analysis
- Data Cleaning and Missing Value Handling
- Log Transformation of Target Variable
- One-Hot Encoding and Feature Engineering
- Linear Regression, Ridge, and Lasso Comparison
- Regularization for Model Optimization
Lasso Regression performed best by reducing overfitting and improving prediction accuracy through feature selection in a high-dimensional dataset.
~0.87
- End-to-End ML + Forecasting + Segmentation Project
- Sales Prediction using ML Models
- Hyperparameter Tuning using GridSearchCV
- Time Series Forecasting using Facebook Prophet
- Customer Segmentation using RFM + K-Means
- Explainable AI using SHAP
- Streamlit Application + Power BI Dashboard
This project combined machine learning, forecasting, segmentation, and explainable AI into one unified production-ready business intelligence system.
~0.91
I am continuously working on:
- Recruiter-level portfolio projects
- Real-world dashboard projects
- End-to-End analytics case studies
- Production-ready Machine Learning systems
- Strong GitHub + LinkedIn portfolio presence
- Practical problem-solving over theoretical learning
My goal is simple:
LinkedIn Profile:
https://www.linkedin.com/in/kailash-singh-35b2961b0/
GitHub Profile:
https://github.com/kailash4454
I strongly believe:
Companies do not hire people to create charts.
They hire people who can create decisions.
Because:
Bad Data → Bad Reports
Bad Reports → Bad Decisions
I am learning every day to become that kind of analyst.