This project explores datasets through data cleaning, preprocessing, and visualization. The main tasks include:
-
Data Loading & Preprocessing
- Removed unnecessary columns.
- Extracted deck information from the
Cabincolumn. - Label-encoded categorical variables.
- Imputed missing values with mean (numerical) or mode (categorical).
- Saved the cleaned dataset to CSV and JSON formats.
-
Exploratory Data Analysis (EDA)
- Analyzed feature distributions.
- Calculated medians and modes for survivors and non-survivors.
- Created “average passenger” profiles and compared them to real passengers.
- Visualized variable relationships using scatter plots and pairplots.
- Identified the most common words in positive and negative reviews.
- Computed TF-IDF vectors for the texts.
- Visualized key words for easier interpretation.
- Selected and improved 3 “junk charts”, making them more informative and visually clear.
- Saved the enhanced visualizations for reporting and presentation.


