Spam / Ham Email Classification

📌 Overview

This project is a Spam/Ham email classification system built using Jupyter Notebook.
It trains a machine learning pipeline to classify emails/messages as spam or ham using
Count/TF-IDF Vectorizer and Multinomial Naive Bayes (MNB).

The project also includes a simple Streamlit interface for interactive predictions.

⚙️ Technology Used

Python (Jupyter Notebook)
pandas, numpy → data loading & preprocessing
scikit-learn → CountVectorizer / TfidfVectorizer, MultinomialNB, Pipeline, evaluation
joblib → saving/loading trained model
nltk → optional stopword handling
matplotlib, seaborn → plots & confusion matrix
Streamlit → simple demo app

🔄 Process

Dataset → Load spam_ham_database.csv (label + text columns).
Preprocessing → lowercase, remove URLs/emails, normalize numbers, clean special characters (lemmatization optional).
Train/Test Split → 80% training, 20% testing with stratification to keep class balance.
Vectorization → Convert text into numeric features using CountVectorizer / TF-IDF with unigrams + bigrams.
Model Training → Multinomial Naive Bayes classifier trained on the vectorized data.
Evaluation → Accuracy, Precision, Recall, F1-score, and Confusion Matrix.
Artifacts → Save the trained pipeline with joblib for reuse.
Streamlit Demo → Input text → cleaned → classified as Spam/Ham with probability.

✅ Outcome

Achieved high accuracy (~90–95% depending on preprocessing).
Built a reproducible ML pipeline (preprocessing + vectorizer + model).
Demonstrated results interactively with Streamlit.
Clear separation of stages: preprocessing → training → evaluation → serving.

🖥️ Streamlit Interface

The Streamlit app provides a simple way to test the model.

Run locally:

# Install dependencies
pip install -r requirements.txt

# Train & save pipeline (run the notebook first if needed)
jupyter notebook notebook.ipynb

# Run Streamlit demo
streamlit run streamlit_app.py

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
CONFUSION MATRIX .png		CONFUSION MATRIX .png
MULTI NAVYE BAYES.png		MULTI NAVYE BAYES.png
PROJECT EXPLAINATION.png		PROJECT EXPLAINATION.png
README.md		README.md
SPAM HAM DATASET.png		SPAM HAM DATASET.png
Spam_ham_ml_pipeline.ipynb		Spam_ham_ml_pipeline.ipynb
app.py		app.py
interface.png		interface.png
methodologies basic fig.png		methodologies basic fig.png
methodologies.pdf		methodologies.pdf
report (2).docx		report (2).docx
report.docx		report.docx
requirements.txt		requirements.txt
spam_ham_database.csv		spam_ham_database.csv
spam_pipeline.joblib		spam_pipeline.joblib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam / Ham Email Classification

📌 Overview

⚙️ Technology Used

🔄 Process

✅ Outcome

🖥️ Streamlit Interface

Run locally:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spam / Ham Email Classification

📌 Overview

⚙️ Technology Used

🔄 Process

✅ Outcome

🖥️ Streamlit Interface

Run locally:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages