Titanic Survival Prediction (Kaggle Competition)

This repository contains my submission for the legendary Kaggle "Titanic: Machine Learning from Disaster" competition. The goal of this project was to build a predictive model to determine whether a passenger would have survived the sinking of the RMS Titanic.

This notebook represents a complete data science workflow, starting from initial data exploration and cleaning, moving through feature engineering, and finally to model training and prediction.

Choosen Approach:

My strategy for this challenge was to use a Logistic Regression model, focusing heavily on thoughtful data preparation to ensure the model had the best possible information to learn from.

1. Data Exploration & Cleaning

The first step was a deep dive into the train.csv dataset. I used visualizations to understand the characteristics of the passengers and how certain attributes like Class, Sex, and Port of Embarkation correlated with survival.

This initial analysis revealed a few key challenges:

Missing Data: The Age and Cabin columns had a significant number of null values.
Irrelevant Features: Columns like Name and Ticket number, in their raw form, didn't seem immediately useful for prediction.

To address this, I imputed the missing Age values using the median for that column and dropped the Cabin column due to the high volume of missing data.

2. Feature Engineering

With a clean dataset, I engineered the features to be suitable for a machine learning model:

Categorical Encoding: I converted key categorical features like Sex and Embarked into a numerical format using one-hot encoding.
Feature Dropping: To simplify the model and avoid noise, I removed the Name, Ticket, and Cabin columns.

3. Model Training & Prediction

Algorithm: I chose Logistic Regression for its strong baseline performance and high degree of interpretability.
Training: The model was trained on the cleaned and engineered feature set derived from the train.csv file.
Prediction: The trained model was then used to predict the survival outcomes for the 418 passengers in the test.csv dataset, generating the final My prediction.csv submission file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
My_prediction.csv		My_prediction.csv
README.md		README.md
Titantic_survival_pred_using_logistic_regression.ipynb		Titantic_survival_pred_using_logistic_regression.ipynb
test.csv		test.csv
test_new.csv		test_new.csv
train.csv		train.csv
train_new.csv		train_new.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Titanic Survival Prediction (Kaggle Competition)

Choosen Approach:

1. Data Exploration & Cleaning

2. Feature Engineering

3. Model Training & Prediction

About

Uh oh!

Releases

Packages

Languages

Manishrajmss13/Titanic-survival-project-Kaggle-competition-using-Logistic-regression

Folders and files

Latest commit

History

Repository files navigation

Titanic Survival Prediction (Kaggle Competition)

Choosen Approach:

1. Data Exploration & Cleaning

2. Feature Engineering

3. Model Training & Prediction

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages