An intelligent web-based email spam classifier that uses natural language processing and a Multinomial Naive Bayes model to detect spam messages with high accuracy. Built with clean text preprocessing, TF-IDF vectorization, and trained on the well-known spam.csv
dataset.
Multinomial Naive Bayes is chosen for its effectiveness on textual data with discrete features like word frequencies and TF-IDF scores. It’s simple, fast, and highly accurate for spam filtering tasks.
-
Features:
v1
: Label (spam
orham
)v2
: Email/SMS text content
Text preprocessing pipeline: • Lowercasing, punctuation removal, stopwords removal, and stemming with PorterStemmer
Feature engineering: • TF-IDF vectorisation
Model training: • Multinomial Naive Bayes (via Scikit-learn)
Evaluation metrics: • Accuracy, confusion matrix, classification report
Frontend: • Clean and professional Flask-based web interface to analyze email text
Python
Flask
Scikit-learn
NLTK
Pandas, NumPy
HTML, CSS, Jinja2
Metric | Value |
---|---|
Accuracy | 97%+ |
Precision | High |
Recall | High |
F1-Score | Robust |
git clone https://github.com/codedbyasim/Email-Spam-Detection-system.git
cd Email-Spam-Detection-system
pip install -r requirements.txt
python app.py
Then open your browser and visit: http://localhost:5000
Email-Spam-Detection-system/
│
├── app.py # Flask web app
├── vectorizer.pkl # TF-IDF Vectorizer
├── spam_classifier_model.pkl # Trained Naive Bayes model
├── Email_Spam_Detection_system_using_Naive_Bayes.ipynb
├── static/
│ └── style.css # Custom frontend styling
├── templates/
│ └── index.html # Frontend HTML with Jinja2
├── spam.csv # Raw dataset
├── README.md
└── requirements.txt
Muhammad Asim Hanif Software Engineering Student | ML & AI Enthusiast GitHub | LinkedIn