Author: Uttarayan Haldar Objective: Practicing real-world inspired data cleaning challenges daily to master preprocessing, wrangling, and transformation for Data Science & Analytics.
This repository contains my daily data cleaning challenges inspired by real-world datasets. Each challenge simulates messy, inconsistent, or incomplete data, and I document the process of transforming it into analysis-ready form.
-
Languages: Python (Pandas, NumPy)
-
Libraries: Faker, datetime, re, pyjanitor
-
Techniques:
- Handling missing values
- Removing duplicates
- Fixing data types
- String/text cleaning
- Outlier detection & treatment
- Feature transformation & encoding
Each day’s folder contains:
- dataset/ – Raw and cleaned CSV files.
- notebook/ – Jupyter Notebook documenting the cleaning process.
- notes.md – Summary of learnings from the day’s challenge.
- Build muscle memory for fast, accurate cleaning.
- Simulate real-world messiness (typos, mixed formats, outliers).
- Develop reproducible cleaning scripts for portfolio use.