Skip to content

leeleisya/Exploratory_data_analysis_and_visualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory_data_analysis_and_visualization

This project explores datasets through data cleaning, preprocessing, and visualization. The main tasks include:

Titanic Dataset Analysis

  • Data Loading & Preprocessing

    • Removed unnecessary columns.
    • Extracted deck information from the Cabin column.
    • Label-encoded categorical variables.
    • Imputed missing values with mean (numerical) or mode (categorical).
    • Saved the cleaned dataset to CSV and JSON formats.
  • Exploratory Data Analysis (EDA)

    • Analyzed feature distributions.
    • Calculated medians and modes for survivors and non-survivors.
    • Created “average passenger” profiles and compared them to real passengers.
    • Visualized variable relationships using scatter plots and pairplots.

Example

Text Data Analysis

  • Identified the most common words in positive and negative reviews.
  • Computed TF-IDF vectors for the texts.
  • Visualized key words for easier interpretation.

Wordcloud Negative

Wordcloud Positive

Chart Improvements

  • Selected and improved 3 “junk charts”, making them more informative and visually clear.
  • Saved the enhanced visualizations for reporting and presentation.

About

This project explores the dataset through data cleaning, preprocessing, and visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published