This tool enables rent prediction and classification using real estate datasets. It is designed to process data from a static file, apply statistical analysis, and train predictive models for both classification and regression tasks. The tool also allows users to input new data for prediction, with outputs delivered through visualizations and summaries.
-
Data Preprocessing Module
Handles cleaning, transformation, and standardization of raw real estate data. -
Statistical Analysis Module
Provides visual and numerical exploration of data, including correlation analysis and outlier detection. -
Classification Module
Categorizes rent values using a classification model, generating performance metrics. -
Regression Module
Predicts continuous rent values and evaluates performance using standard and custom R² metrics with tolerance.
- Input is read from a static structured dataset (e.g., CSV file).
- No external API or dynamic upload interface is used.
- Converts data types for numerical and categorical fields.
- Removes missing values and outliers.
- Outputs a clean dataset ready for modeling.
- Computes descriptive statistics.
- Generates correlation matrices.
- Uses histograms and boxplots for distribution visualization.
- Applies IQR-based filtering to detect and remove outliers.
- Categorizes rent values using interval binning.
- Prepares data pipelines for scaling and encoding.
- Trains a classification model and evaluates accuracy on train/test split.
- Models continuous rent predictions using linear regression.
- Uses pipelines for consistent preprocessing.
- Evaluates with:
- Standard R²
- Absolute tolerance-based R²
- Percentage-based tolerance R²
- The user manually inputs feature values via code cells.
- The system outputs prediction results.
- Visualizations and summary tables are used to communicate results.
- Key performance metrics are printed to the console.
- Prediction results and errors are displayed via graphs and tabular summaries.
- Custom tolerance-based metrics provide practical insights into model accuracy.
- Requires interpreter version 3.9 or higher.
- Uses the following libraries:
- Data Processing: pandas, numpy
- Visualization: matplotlib, seaborn
- Machine Learning: scikit-learn
| Stage | Objective |
|---|---|
| Data Preprocessing | Clean and standardize raw housing data |
| Statistical Review | Analyze distributions and correlations |
| Classification | Categorize rental price ranges |
| Regression | Predict exact rental prices with error margins |
| Metric | Value |
|---|---|
| Standard R² | ~0.50 |
| R² with ±10,000 TL tolerance | ~0.66 |
| R² with ±50% tolerance (relative) | ~0.79 |
The methodology aligns with a public video tutorial series:
"Machine Learning with Real Data: Rent Prediction" – YouTube Playlist
Watch the Series