Social Media Sentiment Analysis

This project performs sentiment analysis on tweets using the Twitter Sentiment Analysis Dataset from Kaggle. It applies NLP techniques for data cleaning and uses models like Naive Bayes and TF-IDF with ML classifiers to classify sentiments as positive or negative.

📁 Dataset

The dataset used is: training.1600000.processed.noemoticon.csv from Kaggle's Twitter Sentiment Analysis dataset

Sentiment: 0 = Negative, 4 = Positive (converted to 1 for binary classification)
Text: Actual tweet text

👉 You can download it from Kaggle: Twitter Sentiment Analysis Dataset

🛠️ Technologies Used

Python
NLTK
NumPy / Pandas
Matplotlib
Scikit-learn

🧹 Data Preprocessing

Drop irrelevant columns: Keep only Sentiment and Text.
Sentiment Mapping: 4 → 1 (positive)
Downsampling: Balance dataset to have equal positive and negative samples.
Text Cleaning:
- Lowercasing
- Remove stopwords & punctuation
- Remove digits, tags, special characters
- Lemmatization using WordNetLemmatizer

📊 Exploratory Data Analysis (EDA)

Distribution of sentiments before and after balancing
Sample visualization using histograms
Top features contributing to sentiment classification

🔍 Model Building

1. NLTK Naive Bayes Classifier

Token-based feature extraction
Accuracy:
- Training: ~86%
- Testing: ~76%

2. TF-IDF + ML Classifiers

Feature extraction using TfidfVectorizer
Models:
- Multinomial Naive Bayes
- Bernoulli Naive Bayes
- Linear SVM
Model evaluation using accuracy and confusion matrix

📈 Results

Model	Accuracy (Test)
NLTK Naive Bayes	76%
Multinomial NB (TF-IDF)	~84–86%
Linear SVM (TF-IDF)	~88–90%

📌 Future Work

Hyperparameter tuning using GridSearchCV
Adding emojis/emoticons handling
Applying deep learning models (e.g., LSTM, BERT)

🙌 Acknowledgements

Kaggle Twitter Dataset by Kaggle user kazanova
NLTK and Scikit-learn documentation

Would you like me to generate this as a downloadable README.md file for GitHub?

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
sentiment-analysis.ipynb		sentiment-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Social Media Sentiment Analysis

📁 Dataset

🛠️ Technologies Used

🧹 Data Preprocessing

📊 Exploratory Data Analysis (EDA)

🔍 Model Building

1. NLTK Naive Bayes Classifier

2. TF-IDF + ML Classifiers

📈 Results

📌 Future Work

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

manasa-26/sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

Social Media Sentiment Analysis

📁 Dataset

🛠️ Technologies Used

🧹 Data Preprocessing

📊 Exploratory Data Analysis (EDA)

🔍 Model Building

1. NLTK Naive Bayes Classifier

2. TF-IDF + ML Classifiers

📈 Results

📌 Future Work

🙌 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages