GitHub - Muzammil603/MLproject_584

🎧 Customer Churn Prediction - Spotify Music Streaming Platform This project applies machine learning to predict customer churn using user interaction data from a music streaming service similar to Spotify. It identifies high-risk users and provides insights to inform retention strategies using models like Random Forest, Logistic Regression, and Naive Bayes.

📊 Objective To develop a machine learning pipeline that:

Accurately predicts user churn

Analyzes key churn factors such as user engagement, subscription level, and location

Supports strategic customer retention efforts

🧠 Key Features Comprehensive Data Cleaning: Removes noisy data, handles missing values, and processes user session and demographic data.

Feature Engineering: Captures behavior like:

Time since registration

Song listening patterns

Subscription level (Free/Paid)

Interaction with platform features (e.g., Thumbs Up/Down, Add to Playlist)

Geographic region

Model Training and Evaluation:

Trained and tested using an 80/20 split

Uses F1 Score as the primary metric due to class imbalance

Baseline model (Naive), Logistic Regression, and Random Forest tested

🗂 Dataset Source: Internal JSON file representing Spotify user logs

Size: 286,500 records, 18 features

Fields:

userId, sessionId, page, gender, location, level, userAgent, etc.

Events like NextSong, Thumbs Up, Add to Playlist, Logout, etc.

⚙️ Methodology Data Preprocessing

Removed incomplete entries (missing user/session IDs)

Parsed and standardized timestamps and session logs

Feature Engineering

Aggregated features per user

Binary churn indicator from “Cancellation Confirmation” event

Ratios and frequency distributions of user behavior

Modeling

Used RandomForestClassifier as the best-performing model

Hyperparameter tuning using GridSearchCV

Achieved:

F1 Score (Test): 0.717

Accuracy (Test): 0.756

Important Features Identified

Time since registration

Thumbs Down ratio

Ad interaction rate

User geography

Homepage interaction

🔍 Results

Model F1 Score (Test) Accuracy (Test) Naive Bayes 0.668 0.769 Logistic Regression 0.621 0.733 Random Forest 0.717 0.756 Random Forest was found to be most robust and generalizable.

🚀 Future Work Real-time churn prediction deployment

Integrating social media or review data for richer insights

A/B testing for retention strategy validation

Exploring ensemble and deep learning methods

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
MLProject.ipynb		MLProject.ipynb
Project_Final_Report.docx		Project_Final_Report.docx
Project_Final_Report.pdf		Project_Final_Report.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages