A Python CLI tool to detect outliers using IQR, split datasets, print summary stats, and save results as CSV.
This Python script:
- Accepts a CSV file and a target column as input
- Performs Interquartile Range (IQR) based outlier detection
- Splits the dataset into two parts: outliers and non-outliers
- Calculates and prints summary statistics (mean, median, std, IQR) to the console
- Saves both subsets as separate CSV files (
outliers.csv
,non_outliers.csv
)
β οΈ Note: No plotting or HTML reports are included. The output is purely statistical and text-based.
- CLI-based data preprocessing and outlier detection
- Efficient reading, writing, and filtering of CSV data using Python
- Statistical analysis using the IQR method
- File I/O operations with pandas
- Modular code design with argument handling and clean output
pandas
: for data manipulation and analysisnumpy
: for numerical computationssys
: to handle command-line arguments
- Import Required Libraries
- Accept Command-Line Arguments
python outlier_split.py <input_file.csv> <column_index>
- Read the Input CSV
- Extract the Target Column
- Perform IQR-Based Outlier Detection
- Calculate and Print Summary Statistics
(Mean, Median, Std Dev, IQR, Min, Max) - Save Subsets as CSV Files
outliers.csv
non_outliers.csv
python outlier_split.py data.csv 2
Example Output:
Original Data Summary:
Mean: 57.2 | Median: 55.0 | Std Dev: 12.6 | IQR: 15.5
Outlier Subset Summary:
Mean: 98.3 | Count: 6
Non-Outlier Subset Summary:
Mean: 53.1 | Count: 94