Skip to content

CodeTaha/IMDB_Sentiment_Analysis_NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎬 IMDB Sentiment Analysis (NLP)

This project builds a binary sentiment classifier (positive/negative) for IMDB movie reviews using TF-IDF and linear classifiers (Logistic Regression, Linear SVM, ComplementNB). It selects the best model via 5-fold cross-validation, saves it, and provides a desktop GUI with CustomTkinter.

βš™οΈ Requirements

  • 🐍 Python 3.10+
  • πŸͺŸ Windows PowerShell (commands below assume Windows)

Setup (recommended: virtual environment):

cd C:\Users\Konyar\Desktop\Code\IMDB_Sentiment_Analysis_NLP
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r requirements.txt

πŸ“¦ Dataset

🧠 Train the Model

The command below trains models, selects the best via CV, evaluates on a test split, and saves the model.

python IMDB_Sentiment_Analyser.py train --csv datasets\movie.csv --model models\sentiment_pipeline.joblib

πŸ“Š What you will see

  • βœ… Test Accuracy: ... β†’ test accuracy
  • πŸ“„ Classification Report: β†’ precision / recall / F1
  • πŸ”’ Confusion Matrix: printed in console
  • πŸ–ΌοΈ Confusion matrix image saved to models/confusion_matrix.png
  • πŸ’Ύ Model saved to models/sentiment_pipeline.joblib

πŸ–₯️ Launch the GUI

After training:

python IMDB_Sentiment_Analyser.py gui --model models\sentiment_pipeline.joblib

Enter a review and click "Predict". The GUI shows prediction (Positive/Negative) and a confidence percentage.

πŸ“ Notes

  • ⏱️ Cross-validation over ~40k rows can take a few minutes.
  • πŸ“ˆ Confidence comes from predict_proba if available (Logistic Regression). For margin-based models (LinearSVC), a sigmoid mapping of the decision score is shown for readability. This is not a calibrated probability but useful as a relative confidence.

πŸ—‚οΈ Project Structure

IMDB_Sentiment_Analysis_NLP/
  IMDB_Sentiment_Analyser.py      # Training, evaluation (with CM plot), GUI
  datasets/
    movie.csv                     # Dataset (text,label)
  models/
    sentiment_pipeline.joblib     # (Created after training)
    confusion_matrix.png          # (Created after training)
  requirements.txt
  README.md

πŸ“œ License

This project is for educational purposes. Dataset usage terms belong to their respective source.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages