Graph visualization of big messy data
-
Updated
Jan 30, 2017 - JavaScript
Graph visualization of big messy data
Synthetic dirty data generator
TablePilot:本地优先的复杂表格智能分析工作台,把混乱 Excel/CSV/TXT 转化为质量修复计划、洞察卡片和可解释报告。 | Local-first messy table analysis workbench for repair plans, insight cards, and explainable reports over Excel/CSV/TXT files.
Script for classifying your messy directories
Package for entity matching, standardization, and visualization using embeddings from large language models.
See how a model comes apart when repeatedly photogrammetry'd
A Python tool that transforms clean datasets into realistic messy datasets for testing data cleaning processes
[READ-ONLY MIRROR] A Python implementation for Hadley Wickham's Tidy Data paper
Robust CSV dialect detection methodology for Python that outperforms existing state of the art solutions by 8.35% in terms of their F1 scores, using only built-in Python modules.
Configurable messy CSV generator for testing data pipelines and ETL processes. Three mess levels, 20+ field types, SQL/XSS injection simulation. No install required.
Statistical Programming in SAS
😺 The easiest way to structure unstructured data
To get a hands-on experience with real-life messy data, I chose to work with food and nutrient data available on FoodData Central. I wanted to compare nutrients across different types of foods available in the US market.
A Python data cleaning project demonstrating advanced Pandas techniques, regex, and data standardization on a messy HR dataset.
An end-to-end Python data cleaning pipeline using Pandas to resolve corrupt, missing, and inconsistent transactional logs in a Café Sales Dataset.
Use generator expressions, formatting operations, and cleaning methods to prepare data for analysis.
Add a description, image, and links to the messy-data topic page so that developers can more easily learn about it.
To associate your repository with the messy-data topic, visit your repo's landing page and select "manage topics."