MultiClean is a Python library for morphological cleaning of multiclass 2D numpy arrays (segmentation masks and classification rasters). It provides efficient tools for edge smoothing and small-island removal across multiple classes, then fills gaps using the nearest valid class.
Below: Land Use before/after cleaning (smoothed edges, small-island removal, nearest-class gap fill).
pip install multicleanor
uv add multicleanimport numpy as np
from multiclean import clean_array
# Create a sample classification array with classes 0, 1, 2, 3
array = np.random.randint(0, 4, (1000, 1000), dtype=np.int32)
# Clean with default parameters
cleaned = clean_array(array)
# Custom parameters
cleaned = clean_array(
    array,
    class_values=[0, 1, 2, 3],
    smooth_edge_size=2,     # kernel width, larger value increases smoothness
    min_island_size=100,    # remove components with area < 100
    connectivity=8,         # 4 or 8
    max_workers=4,
    fill_nan=False          # enable/disable the filling of nan values in input array
)MultiClean is designed for cleaning segmentation outputs from:
- Remote sensing: Land cover classification, crop mapping
- Computer vision: Semantic segmentation post-processing
- Geospatial analysis: Raster classification cleaning
- Machine learning: Neural network output refinement
- Multi-class processing: Clean all classes in one pass
- Edge smoothing: Morphological opening to reduce jagged boundaries
- Island removal: Remove small connected components per class
- Gap filling: Fill invalids via nearest valid class (distance transform)
- Fast: NumPy + OpenCV + SciPy with parallelism
MultiClean uses morphological operations to clean classification arrays:
- Edge smoothing (per class): Morphological opening with an elliptical kernel.
- Island removal (per class): Find connected components (OpenCV) and mark components with area < min_island_sizeas invalid.
- Gap filling: Compute a distance transform to copy the nearest valid class into invalid pixels.
Classes are processed together and the result maintains a valid label at every pixel.
from multiclean import clean_array
out = clean_array(
    array: np.ndarray,
    class_values: int | list[int] | None = None,
    smooth_edge_size: int = 2,
    min_island_size: int = 100,
    connectivity: int = 4,
    max_workers: int | None = None,
    fill_nan: bool = False
)- array: 2D numpy array of class labels (int or float). For float arrays,- NaNis treated as nodata and will remain- NaNunless- fill_nanis set to- True.
- class_values: Classes to consider. If- None, inferred from- array(ignores- NaNfor floats). An int restricts cleaning to a single class.
- smooth_edge_size: Kernel size (pixels) for morphological opening. Use- 0to disable.
- min_island_size: Remove components with area strictly- < min_island_size. Use- 1to keep single pixels.
- connectivity: Pixel connectivity for components,- 4or- 8.
- max_workers: Parallelism for per-class operations (None lets the executor choose).
- fill_nan: If True will fill NAN values from input array with nearest valid value.
Returns a numpy array matching the input shape. Integer inputs return integer outputs. Float arrays with NaN are supported and can be filled or remain as NAN.
from multiclean import clean_array
import rasterio
# Read land cover classification
with rasterio.open('landcover.tif') as src:
    landcover = src.read(1)
# Clean with appropriate parameters for satellite data
cleaned = clean_array(
    landcover,
    class_values=[0, 1, 2, 3, 4],  # forest, water, urban, crop, other
    smooth_edge_size=1,
    min_island_size=25,
    connectivity=8,
    fill_nan=False
)from multiclean import clean_array
# Model produces logits; convert to class predictions
np_pred = np_model_logits.argmax(axis=0)  # shape: (H, W)
# Clean the segmentation
cleaned = clean_array(
    np_pred,
    smooth_edge_size=2,
    min_island_size=100,
    connectivity=4,
)See the notebooks folder for end-to-end examples:
- Land Use Example Notebook: land use classification cleaning
- Cloud Example Notebook: cloud/shadow classification cleaning
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
