Real-World Data Cleaning

Author: Uttarayan Haldar Objective: Practicing real-world inspired data cleaning challenges daily to master preprocessing, wrangling, and transformation for Data Science & Analytics.

Overview

This repository contains my daily data cleaning challenges inspired by real-world datasets. Each challenge simulates messy, inconsistent, or incomplete data, and I document the process of transforming it into analysis-ready form.

Skills & Tools

Languages: Python (Pandas, NumPy)
Libraries: Faker, datetime, re, pyjanitor
Techniques:
- Handling missing values
- Removing duplicates
- Fixing data types
- String/text cleaning
- Outlier detection & treatment
- Feature transformation & encoding

Daily Challenge Structure

Each day’s folder contains:

dataset/ – Raw and cleaned CSV files.
notebook/ – Jupyter Notebook documenting the cleaning process.
notes.md – Summary of learnings from the day’s challenge.

Learning Goals

Build muscle memory for fast, accurate cleaning.
Simulate real-world messiness (typos, mixed formats, outliers).
Develop reproducible cleaning scripts for portfolio use.

Connect

Email: uttarayan.haldar.data@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Week 01		Week 01
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-World Data Cleaning

Overview

Skills & Tools

Daily Challenge Structure

Learning Goals

Connect

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Real-World Data Cleaning

Overview

Skills & Tools

Daily Challenge Structure

Learning Goals

Connect

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages