Skip to content

kailash4454/Data-Science-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Science-Portfolio

Turning Data into Business Decisions

Welcome to my Data Science Portfolio — a collection of hands-on projects, end-to-end Machine Learning systems, business analytics case studies, EDA projects, statistical analysis projects, dashboards, and recruiter-level practical work built to solve real-world problems using data.

This repository reflects my journey from learning concepts to building industry-ready solutions in:

  • Data Analytics
  • Data Science
  • Machine Learning
  • Business Intelligence
  • Statistical Analysis
  • Dashboard Development
  • Healthcare Analytics
  • Retail Intelligence
  • Predictive Modeling
  • End-to-End Deployment Projects

👨‍💻 About Me

Hi, I'm Kailash Singh Rawat — an MCA (Data Science) student and an aspiring Data Analyst / Data Scientist passionate about solving real-world business problems using data.

I strongly believe:

Before building Machine Learning models, understanding the data deeply is the real skill that creates strong analysts.

My focus is not just on building models, but on solving business problems through:

  • clean data
  • strong analysis
  • meaningful insights
  • decision-support systems
  • production-ready project thinking

This repository contains my Jupyter notebooks, analytics projects, machine learning systems, and portfolio work designed with recruiter expectations in mind.


🛠️ Skills & Tools


💻 Programming & Query Languages

  • Python
  • SQL
  • MySQL

📊 Data Analysis & Visualization

  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Power BI
  • Tableau
  • Excel

🤖 Machine Learning

  • Scikit-learn
  • Logistic Regression
  • Multiple Linear Regression
  • Ridge Regression
  • Lasso Regression
  • Random Forest
  • Gradient Boosting
  • K-Means Clustering
  • Feature Engineering
  • Model Evaluation
  • Hyperparameter Tuning
  • Pipeline & ColumnTransformer

📈 Advanced Areas

  • Statistics
  • Exploratory Data Analysis (EDA)
  • Hypothesis Testing
  • Confidence Intervals
  • Statistical Inference
  • Time Series Forecasting
  • Explainable AI (SHAP)
  • NLP
  • Deep Learning (Learning Phase)
  • Business Analytics
  • Healthcare Analytics
  • Retail Intelligence

📂 Repository Structure

Data-Science-Portfolio/
│
├── Python/
│
├── Statistics & EDA/
│   ├── Sales & Discounts Analysis Project
│   ├── Hospital Patient Data Analysis Project
│   ├── Cardiotocographic EDA Project
│   ├── Hypothesis Testing - Bombay Hospitality Ltd
│   └── Estimation And Confidence Intervals
│
├── Machine Learning/
│   ├── Toyota Corolla Price Prediction - Multiple Linear Regression
│   └── Diabetes Prediction using Logistic Regression
│
├── NumPy & Pandas/
│
├── Resume Projects/
│   ├── Titanic Survival Prediction
│   ├── House Price Prediction
│   └── Retail Intelligence & Forecasting System
│
├── SQL Projects/
│
├── Power BI Dashboards/
│
└── README.md

🌟 Featured Projects


📊 Statistics & EDA Projects


1️⃣ Sales & Discounts Analysis Project

📌 Project Focus

  • Mean, Median, Mode, Standard Deviation
  • Histograms and Distribution Analysis
  • Boxplots and Outlier Detection using IQR
  • Categorical Analysis using Bar Charts
  • Discount Behavior Analysis
  • Revenue Concentration Analysis
  • Business Interpretation of Sales Patterns

🔍 Key Insight

A small number of high-value transactions were generating a major portion of total revenue, while most transactions remained low to medium in value.

This project strengthened my understanding of how descriptive analytics directly impacts business decisions.


2️⃣ Hospital Patient Data Analysis Project

📌 Project Focus

  • Patient Data Cleaning and Preprocessing
  • Missing Value Handling using Mean Imputation
  • Duplicate Record Removal using PatientID
  • Department-wise Revenue Analysis using GroupBy
  • Merging Patient + Billing Datasets
  • Row-wise and Column-wise Concatenation
  • Billing Analytics and Doctor Performance Preparation

🔍 Key Insight

Accurate billing analytics depends heavily on duplicate removal, missing value handling, and proper dataset merging before any dashboard or machine learning model can be trusted.

This project strengthened my understanding of healthcare analytics and operational intelligence.


3️⃣ Cardiotocographic EDA Project

📌 Project Focus

  • Medical Dataset Cleaning and Preprocessing
  • Missing Value Handling using Median Imputation
  • Outlier Detection using IQR
  • Histograms, Boxplots, and Violin Plots
  • Correlation Heatmaps and Scatter Plots
  • Pair Plot Analysis
  • Fetal Health Monitoring Insights

🔍 Key Insight

A strong relationship between uterine contractions and late decelerations highlighted possible contraction-related fetal stress and high-risk pregnancy indicators.

This project strengthened my understanding of medical data interpretation and healthcare decision support.


4️⃣ Bombay Hospitality Ltd. Operating Cost Analysis (Hypothesis Testing)

📌 Project Focus

  • Hypothesis Testing
  • Right-Tailed Z-Test
  • Statistical Decision-Making
  • Critical Value Comparison
  • Business Model Validation
  • Evidence-Based Operational Analysis

🔍 Key Insight

Although restaurant owners believed operating costs had increased, statistical testing showed that the observed average costs were actually significantly lower than the theoretical model prediction.

This project strengthened my understanding of statistical decision-making and business validation using hypothesis testing.


5️⃣ Estimation And Confidence Intervals using Statistical Inference

📌 Project Focus

  • Confidence Interval Estimation
  • t-Distribution vs z-Distribution
  • Statistical Inference
  • Margin of Error Analysis
  • Manufacturing Reliability Estimation
  • Product Quality Analytics

🔍 Key Insight

The project demonstrated how uncertainty increases when population standard deviation is unknown, leading to wider t-confidence intervals compared to z-confidence intervals.

This project strengthened my understanding of estimation theory, uncertainty quantification, and industrial quality control analytics.


🤖 Machine Learning Projects


6️⃣ Toyota Corolla Price Prediction using Multiple Linear Regression

📌 Project Focus

  • Multiple Linear Regression (MLR)
  • Exploratory Data Analysis (EDA)
  • Data Preprocessing
  • Feature Engineering
  • Model Evaluation
  • Ridge Regression
  • Lasso Regression
  • Regularization Techniques
  • Multicollinearity Handling
  • Predictive Analytics

🔍 Key Insight

Car age and kilometers driven showed strong negative relationships with price, while horsepower and vehicle weight positively influenced resale value.

The project demonstrated how regression models and regularization techniques improve pricing prediction and model stability in automotive analytics.


7️⃣ Diabetes Prediction using Logistic Regression

📌 Project Focus

  • Logistic Regression
  • Binary Classification
  • Healthcare Analytics
  • Exploratory Data Analysis (EDA)
  • Medical Data Preprocessing
  • Feature Scaling
  • ROC-AUC Analysis
  • Explainable AI
  • Streamlit Deployment
  • Healthcare Risk Prediction

🔍 Key Insight

Glucose, BMI, Age, and DiabetesPedigreeFunction showed strong influence on diabetes risk prediction.

The project demonstrated how Machine Learning and healthcare analytics can support early disease detection and preventive healthcare systems.


💼 Resume Projects (Recruiter-Level End-to-End Projects)


8️⃣ Titanic Survival Prediction (End-to-End ML Project)

📌 Project Focus

  • Complete ML Pipeline using Pipeline + ColumnTransformer
  • Custom Feature Engineering using Transformer Class
  • Logistic Regression Classification Model
  • FastAPI Backend for Real-Time Prediction
  • Streamlit Frontend UI
  • Production-Ready Project Structure

🔍 Key Insight

This project demonstrated how machine learning moves beyond notebooks into real-world deployment using APIs, pipelines, and interactive applications.

🎯 Accuracy Achieved

~81%


9️⃣ House Price Prediction using Regression Models

📌 Project Focus

  • High-Dimensional Real Estate Dataset Analysis
  • Data Cleaning and Missing Value Handling
  • Log Transformation of Target Variable
  • One-Hot Encoding and Feature Engineering
  • Linear Regression, Ridge, and Lasso Comparison
  • Regularization for Model Optimization

🔍 Key Insight

Lasso Regression performed best by reducing overfitting and improving prediction accuracy through feature selection in a high-dimensional dataset.

🎯 Best R² Score

~0.87


🔟 Retail Intelligence & Forecasting System

📌 Project Focus

  • End-to-End ML + Forecasting + Segmentation Project
  • Sales Prediction using ML Models
  • Hyperparameter Tuning using GridSearchCV
  • Time Series Forecasting using Facebook Prophet
  • Customer Segmentation using RFM + K-Means
  • Explainable AI using SHAP
  • Streamlit Application + Power BI Dashboard

🔍 Key Insight

This project combined machine learning, forecasting, segmentation, and explainable AI into one unified production-ready business intelligence system.

🎯 Achieved R² Score

~0.91


🚀 What I Am Building

I am continuously working on:

  • Recruiter-level portfolio projects
  • Real-world dashboard projects
  • End-to-End analytics case studies
  • Production-ready Machine Learning systems
  • Strong GitHub + LinkedIn portfolio presence
  • Practical problem-solving over theoretical learning

My goal is simple:

Become industry-ready with practical skills, not just theoretical knowledge.


📬 Connect With Me


🔗 LinkedIn

LinkedIn Profile:
https://www.linkedin.com/in/kailash-singh-35b2961b0/


💻 GitHub

GitHub Profile:
https://github.com/kailash4454


📧 Email

kailashsingh2203@gmail.com


⭐ Final Note

I strongly believe:

Companies do not hire people to create charts.
They hire people who can create decisions.

Because:

Bad Data → Bad Reports
Bad Reports → Bad Decisions

I am learning every day to become that kind of analyst.


⭐ If you find my work valuable, feel free to explore, connect, and collaborate.

About

Welcome to my Data Science Portfolio — a collection of hands-on projects, end-to-end Machine Learning systems, business analytics case studies, EDA projects, dashboards, and recruiter-level practical work built to solve real-world problems using data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors