Skip to content

Conversation

@AhmadYasser1
Copy link

@AhmadYasser1 AhmadYasser1 commented Oct 25, 2025

Summary

This PR adds commonly requested data cleaning utilities to datascience.Table, addressing #656.

Changes

  • Table.drop_na: drop rows/columns with missing values (any/all)
  • Table.fill_na: fill missing values with a scalar or a strategy (mean/median/mode)
  • Table.drop_duplicates: remove duplicate rows with subset/keep options
  • Table.convert_types: convert via mapping (per-column callable or type) or simple inference for numeric strings
  • Tests in tests/test_cleaning.py
  • CHANGELOG update and version bump to 0.19.0

Motivation

Many workflows require basic cleaning before analysis. Providing first-class APIs in Table avoids round-tripping through pandas and keeps parity with typical data wrangling tasks for students.

Usage Examples

# Drop rows with any missing values in columns 'a' and 'b'
t2 = t.drop_na(subset=['a', 'b'])

# Drop columns that contain any missing values
t2 = t.drop_na(axis='columns')

# Fill with a scalar
t2 = t.fill_na(value=0)

# Fill numeric columns by mean; categorical via mode
t2 = t.fill_na(strategy='mean')

# Remove duplicate rows, keeping first occurrence
t2 = t.drop_duplicates()
# Or based on a subset, dropping all duplicates
t2 = t.drop_duplicates(subset='id', keep='none')

# Convert types with mapping or infer numeric types
t2 = t.convert_types(mapping={'id': int, 'price': float})
t3 = t.convert_types(infer=True)

Notes

  • Uses numpy and pandas.isnull consistent with existing code; no hard pandas dependency at call sites.
  • Non-intrusive to existing APIs; all methods return new tables (no inplace).

Checklist

  • Tests added
  • CHANGELOG updated
  • Version bumped (delete if maintainers prefer to bump during release)

…icates, convert_types)\n\n- Implement Table.drop_na for rows/columns with any/all nulls\n- Implement Table.fill_na with scalar and strategies (mean/median/mode)\n- Implement Table.drop_duplicates with subset and keep options\n- Implement Table.convert_types via mapping and basic inference\n- Add tests in tests/test_cleaning.py\n- Bump version to 0.19.0 and update CHANGELOG\n\nRefs data-8#656
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant