Skip to content

Muzammil603/MLproject_584

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🎧 Customer Churn Prediction - Spotify Music Streaming Platform This project applies machine learning to predict customer churn using user interaction data from a music streaming service similar to Spotify. It identifies high-risk users and provides insights to inform retention strategies using models like Random Forest, Logistic Regression, and Naive Bayes.

📊 Objective To develop a machine learning pipeline that:

Accurately predicts user churn

Analyzes key churn factors such as user engagement, subscription level, and location

Supports strategic customer retention efforts

🧠 Key Features Comprehensive Data Cleaning: Removes noisy data, handles missing values, and processes user session and demographic data.

Feature Engineering: Captures behavior like:

Time since registration

Song listening patterns

Subscription level (Free/Paid)

Interaction with platform features (e.g., Thumbs Up/Down, Add to Playlist)

Geographic region

Model Training and Evaluation:

Trained and tested using an 80/20 split

Uses F1 Score as the primary metric due to class imbalance

Baseline model (Naive), Logistic Regression, and Random Forest tested

🗂 Dataset Source: Internal JSON file representing Spotify user logs

Size: 286,500 records, 18 features

Fields:

userId, sessionId, page, gender, location, level, userAgent, etc.

Events like NextSong, Thumbs Up, Add to Playlist, Logout, etc.

⚙️ Methodology Data Preprocessing

Removed incomplete entries (missing user/session IDs)

Parsed and standardized timestamps and session logs

Feature Engineering

Aggregated features per user

Binary churn indicator from “Cancellation Confirmation” event

Ratios and frequency distributions of user behavior

Modeling

Used RandomForestClassifier as the best-performing model

Hyperparameter tuning using GridSearchCV

Achieved:

F1 Score (Test): 0.717

Accuracy (Test): 0.756

Important Features Identified

Time since registration

Thumbs Down ratio

Ad interaction rate

User geography

Homepage interaction

🔍 Results

Model F1 Score (Test) Accuracy (Test) Naive Bayes 0.668 0.769 Logistic Regression 0.621 0.733 Random Forest 0.717 0.756 Random Forest was found to be most robust and generalizable.

🚀 Future Work Real-time churn prediction deployment

Integrating social media or review data for richer insights

A/B testing for retention strategy validation

Exploring ensemble and deep learning methods

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors