This repository contains Exploratory Data Analysis (EDA) for two distinct medical datasets, focusing on different predictive tasks: classification for COVID-19 patient survival and regression for brain tumor patient survival duration.
The primary goal of this repository is to explore, clean, visualize, and understand patterns within two separate medical datasets. The insights gained from EDA are crucial first steps before building predictive machine learning models.
- Classification Task: Predicting the survival outcome (Survived/Deceased) of COVID-19 patients based on their clinical data.
- Regression Task: Predicting the survival duration (in months) for patients diagnosed with brain tumors based on tumor characteristics and patient data.
- Goal: To understand the factors influencing the survival of COVID-19 patients and perform EDA preparatory to building a classification model.
- Dataset: Kaggle - COVID-19 Dataset
- Description: This dataset includes anonymized patient data, potentially covering demographics, symptoms, pre-existing conditions, and the final outcome (survival).
- EDA Focus: Identifying key features correlated with survival, visualizing distributions of features for survivors vs. non-survivors, handling missing data, and feature engineering opportunities.
- Goal: To explore the relationship between brain tumor characteristics, patient data, and the duration of survival, preparing for a regression model.
- Dataset: Kaggle - Brain Tumor Dataset
- Description: This dataset likely contains information about tumor stage, location, recurrence patterns, patient demographics, and survival time.
- EDA Focus: Analyzing the distribution of survival duration, exploring correlations between features (like tumor stage, grade) and survival time, visualizing trends, handling missing values, and assessing feature importance for regression.