Skip to content

This project explores the dataset through data cleaning, preprocessing, and visualization.

Notifications You must be signed in to change notification settings

leeleisya/Exploratory_data_analysis_and_visualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory_data_analysis_and_visualization

This project explores datasets through data cleaning, preprocessing, and visualization. The main tasks include:

Titanic Dataset Analysis

  • Data Loading & Preprocessing

    • Removed unnecessary columns.
    • Extracted deck information from the Cabin column.
    • Label-encoded categorical variables.
    • Imputed missing values with mean (numerical) or mode (categorical).
    • Saved the cleaned dataset to CSV and JSON formats.
  • Exploratory Data Analysis (EDA)

    • Analyzed feature distributions.
    • Calculated medians and modes for survivors and non-survivors.
    • Created “average passenger” profiles and compared them to real passengers.
    • Visualized variable relationships using scatter plots and pairplots.

Example

Text Data Analysis

  • Identified the most common words in positive and negative reviews.
  • Computed TF-IDF vectors for the texts.
  • Visualized key words for easier interpretation.

Wordcloud Negative

Wordcloud Positive

Chart Improvements

  • Selected and improved 3 “junk charts”, making them more informative and visually clear.
  • Saved the enhanced visualizations for reporting and presentation.

About

This project explores the dataset through data cleaning, preprocessing, and visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors