A machine learning project that automates the classification and routing of customer support tickets using Natural Language Processing (NLP). The system predicts categories like Bug Report, Feature Request, Technical Issue, Billing Inquiry, and Account Management based on ticket text.
- Synthetic Data Generation: Creates realistic support tickets.
- Text Processing Pipeline: Cleans and vectorizes text using TF-IDF.
- High-Performance Model: Optimized Logistic Regression model with <1ms inference latency.
- REST API: Flask-based API for real-time predictions.
- Comprehensive Testing: Unit and integration tests included.
├── data/ # Raw and processed datasets
├── docs/ # Documentation and reports
├── models/ # Trained models and artifacts (joblib)
├── notebooks/ # Exploratory Data Analysis (EDA)
├── reports/ # Generated metrics and figures
├── src/
│ ├── api/ # Flask application
│ ├── data/ # Data generation and preprocessing scripts
│ ├── features/ # Feature engineering scripts
│ └── models/ # Training and evaluation scripts
├── tests/ # Unit and integration tests
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Clone the repository and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtGenerate data, preprocess, and build features:
# Generate synthetic dataset
python src/data/make_dataset.py
# Clean and split data
python src/data/preprocess.py
# Vectorize text and encode labels
python src/features/build_features.pyTrain and optimize the model:
# Train baseline models
python src/models/train_models.py
# Tune hyperparameters
python src/models/optimize_model.py
# Evaluate performance
python src/models/evaluate_model.pyStart the Flask API server:
python src/api/app.pyThe API will start on http://localhost:5001.
You can test the API using the provided script or curl:
bash src/api/test_api.shRun the full test suite using pytest:
pytest tests/