feat(table): add data cleaning utilities (drop_na, fill_na, drop_duplicates, convert_types) #659

AhmadYasser1 · 2025-10-25T04:33:56Z

Summary

This PR adds commonly requested data cleaning utilities to datascience.Table, addressing #656.

Changes

Table.drop_na: drop rows/columns with missing values (any/all)
Table.fill_na: fill missing values with a scalar or a strategy (mean/median/mode)
Table.drop_duplicates: remove duplicate rows with subset/keep options
Table.convert_types: convert via mapping (per-column callable or type) or simple inference for numeric strings
Tests in tests/test_cleaning.py
CHANGELOG update and version bump to 0.19.0

Motivation

Many workflows require basic cleaning before analysis. Providing first-class APIs in Table avoids round-tripping through pandas and keeps parity with typical data wrangling tasks for students.

Usage Examples

# Drop rows with any missing values in columns 'a' and 'b'
t2 = t.drop_na(subset=['a', 'b'])

# Drop columns that contain any missing values
t2 = t.drop_na(axis='columns')

# Fill with a scalar
t2 = t.fill_na(value=0)

# Fill numeric columns by mean; categorical via mode
t2 = t.fill_na(strategy='mean')

# Remove duplicate rows, keeping first occurrence
t2 = t.drop_duplicates()
# Or based on a subset, dropping all duplicates
t2 = t.drop_duplicates(subset='id', keep='none')

# Convert types with mapping or infer numeric types
t2 = t.convert_types(mapping={'id': int, 'price': float})
t3 = t.convert_types(infer=True)

Notes

Uses numpy and pandas.isnull consistent with existing code; no hard pandas dependency at call sites.
Non-intrusive to existing APIs; all methods return new tables (no inplace).

Checklist

Tests added
CHANGELOG updated
Version bumped (delete if maintainers prefer to bump during release)

…icates, convert_types)\n\n- Implement Table.drop_na for rows/columns with any/all nulls\n- Implement Table.fill_na with scalar and strategies (mean/median/mode)\n- Implement Table.drop_duplicates with subset and keep options\n- Implement Table.convert_types via mapping and basic inference\n- Add tests in tests/test_cleaning.py\n- Bump version to 0.19.0 and update CHANGELOG\n\nRefs data-8#656

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(table): add data cleaning utilities (drop_na, fill_na, drop_duplicates, convert_types) #659

feat(table): add data cleaning utilities (drop_na, fill_na, drop_duplicates, convert_types) #659

Uh oh!

AhmadYasser1 commented Oct 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

feat(table): add data cleaning utilities (drop_na, fill_na, drop_duplicates, convert_types) #659

Are you sure you want to change the base?

feat(table): add data cleaning utilities (drop_na, fill_na, drop_duplicates, convert_types) #659

Uh oh!

Conversation

AhmadYasser1 commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Motivation

Usage Examples

Notes

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AhmadYasser1 commented Oct 25, 2025 •

edited

Loading