This Python program:
- Accepts a CSV file as input
- Performs Interquartile Range (IQR) outlier analysis on a selected column
- Splits the dataset into outliers and non-outliers
- Generates an interactive HTML report with:
- File and column information
- Summary statistics
- Histograms and boxplots for all subsets
- Saves the outlier and non-outlier subsets into separate CSV files
- Parsing and preprocessing CSV files using Python
- CLI-based tool development for data analysis tasks
- Outlier detection using the IQR method
- Generating statistical summaries: mean, median, std, IQR
- Creating plots using
matplotlib
- HTML report generation with
Jinja2
templating - Exporting datasets as new CSV files (outliers/non-outliers)
pandas
: Data manipulationnumpy
: Numerical calculationsmatplotlib
: Data visualization (boxplots, histograms)jinja2
: HTML templatingsys
: Command-line argument handling
- Import required libraries
- Read command-line arguments
- Load CSV file
- Extract target column
- Perform IQR-based outlier detection
- Calculate summary statistics
- Split data into outliers and non-outliers
- Generate boxplots and histograms
- Create and save a dynamic HTML report
- Export filtered datasets to CSV
outliers.csv
: Contains outlier recordsnon_outliers.csv
: Contains clean datareport.html
: A styled HTML report with stats and plots
python detect_outliers.py input.csv 2