Skip to content

Uttarayan002/real-world-data-cleaning-practice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Real-World Data Cleaning

Author: Uttarayan Haldar Objective: Practicing real-world inspired data cleaning challenges daily to master preprocessing, wrangling, and transformation for Data Science & Analytics.

Overview

This repository contains my daily data cleaning challenges inspired by real-world datasets. Each challenge simulates messy, inconsistent, or incomplete data, and I document the process of transforming it into analysis-ready form.


Skills & Tools

  • Languages: Python (Pandas, NumPy)

  • Libraries: Faker, datetime, re, pyjanitor

  • Techniques:

    • Handling missing values
    • Removing duplicates
    • Fixing data types
    • String/text cleaning
    • Outlier detection & treatment
    • Feature transformation & encoding

Daily Challenge Structure

Each day’s folder contains:

  1. dataset/ – Raw and cleaned CSV files.
  2. notebook/ – Jupyter Notebook documenting the cleaning process.
  3. notes.md – Summary of learnings from the day’s challenge.

Learning Goals

  • Build muscle memory for fast, accurate cleaning.
  • Simulate real-world messiness (typos, mixed formats, outliers).
  • Develop reproducible cleaning scripts for portfolio use.

Connect

Email: uttarayan.haldar.data@gmail.com

About

Repository for my daily data cleaning practice with Pandas and Polars

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors