Skip to content

A repository dedicated to the analysis of customer segmentation. This project aims to implement and evaluate various segmentation methodologies, drawing inspiration and techniques from current research in the field.

License

Notifications You must be signed in to change notification settings

pb319/Segment-Stream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

89 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Segment-Stream

A repository dedicated to the analysis of customer segmentation. This project aims to implement and evaluate various segmentation methodologies, drawing inspiration and techniques from current research in the field.

Customer.Segments.Primer.mp4

πŸͺ Here is the Full Design

Problem Statement

  • A shopping mall aims to improve its marketing strategies and customer engagement.
  • The mall currently lacks a deep understanding of its customers, including their diverse profiles and purchasing habits.
  • There's also a need to assess the effectiveness of previous marketing campaigns.
  • This lack of data-driven insights makes it difficult for the mall to make informed decisions about resource allocation and targeted promotions.

Stakeholders Expectation

  • Stakeholders expect actionable recommendations derived from the data analysis that can be directly implemented to improve marketing strategies and customer engagement.

  • They anticipate measurable outcomes resulting from the project, such as increased revenue, improved customer loyalty, and a more efficient allocation of marketing resources.

Data Collection

Mall Customers Segmentation

Overview

information, annual income, and spending habits. 
The Mall Customers Dataset provides data on 200 individuals who visit a mall, including demographic 

Columns Description

CustomerID: A unique identifier for each customer (integer).
Genre: The gender of the customer (Male/Female).
Age: The age of the customer (integer).
Annual Income (k$): Annual income of the customer in thousands of dollars (integer).
Spending Score (1-100): A score assigned by the mall based on customer behavior and spending patterns (integer).

Customer Segment Analysis


Image πŸͺ Here is the Full Design
The the above consolidated grouped-barplot depicts the following:

Features Central Tendency Measure of Dispersion
Age * Cluster-1: Demography is the oldest (Median – 66 years)

* Cluster-7: Demography is the youngest (Median – 23.5 years)
* Cluster-4, Cluster-6 show high fluctuation in age distribution (twice as much as the rest)
Income * Cluster-1, 3, 5: Exhibit Middle-Class behavioural patterns

* Cluster-2, 4: Exhibit Business-Class behavioural patterns

* Cluster-6, 7: Exhibit Economy-Class behavioural patterns
* Cluster-2, 3, 4: Exhibit high fluctuations in income

* Cluster-1, 6, 7: Have relatively lower within-group fluctuations
Spending * Cluster-1 (lowest), Cluster-3, Cluster-5: These customer segments have moderate-spending habits

* Cluster-4, 6: Spending habit is the lowest (one-third of moderate cluster)

* Cluster-2 (highest), Cluster-7: Spending habit is very high
* Cluster-1, 3, 5: Exhibit comparatively lower fluctuation in spending habits

* Cluster-4, 6: Lowest spending with highest fluctuation (C.V. of 82.14%)

Consolidated Correlation Heatmap :

Image πŸͺ Here is the Design

Data Interpretation:

The the above consolidated grouped-barplot depicts the following:

Clusters β†’ Direct (+ve) Indirect (–ve) Special Observation
Income vs. Age Cluster-4 Cluster-3, 5, 7 Cluster-3 [–0.25]
Spending vs. Age Cluster-2, 4, 5 Cluster-6 Cluster-6 [–0.28]
Spending vs. Income Cluster-4, 6 Cluster-5, 7 Cluster-4 [0.47]

Note: Correlation values < 0.1 are considered as zero.

  • Cluster-1 is mostly uncorrelated with all existing features. Assumably, they will continue to function without any intervention and create business.

  • Cluster-6 (Lowest Income Group but the Steadiest): With increasing Age, Spending decreases. While with increasing Income, Spending increases. Income and Age seem to be uncorrelated.

  • Cluster-4 (Business Class but the fluctuating income): Income, Age, and Spending increase alongside. Correlation between Income and Spending is highest.

  • Cluster-5, 7 (–ve income effect): Depicts inverse relationship between Income and Spending, i.e., with increase in income, spending decreases.

  • Cluster-2: Exhibits increase in spending with increase in Age.


Statistical Inference and Log-Log Model :


Image

Data Interpretation:

Log-Log Regression Model Suggest the Following :

  • Overall Model Addequacy (Prob (F-statistic) < 0.05): Hence Null Hypothesis (all coefficients are zero) is rejected.
  • The featured (Logarithmic Transformation) log-log model can explain 16.5% of the total variation in spending score (Adj. R-squared = 0.165).
  • Log(Annual Income) has found to be a significant predictor (P>|t| = 0.013).
  • Final Thought: 1 % increase in Annual Income may lead to 2.42 % increase in Avg. Spending Score.

Cluster Interpretation and Business Recommendation

Business.Recommendation.1.mp4

Cluster-Level Summary

Cluster Demographic & Profile Insights Business Recommendation
Cluster-1 Oldest (Median Age = 66), Middle-Class Income, Lowest Spending Uncorrelated with all features, Stable Minimal intervention; focus on retention and elder care offerings
Cluster-2 Younger age, Business-Class Income, Very High Spending Spending ↑ with Age; Positive correlation with Age Premium/luxury product offerings; ideal for aspirational targeting
Cluster-3 Middle-Class Income, Moderate Spending, Younger demographic Income vs Age shows indirect (–ve) correlation Offer mid-range product bundles; observe for future upscaling
Cluster-4 Older age, Business-Class Income, High Spending, High Variability in Age & Income Strongest +ve correlation between Income & Spending Priority segment for loyalty programs, exclusives, and high-value campaigns
Cluster-5 Middle-Class Income, Moderate Spending Income vs Spending shows negative (–ve) correlation Introduce value-based promotions and financial advisory services
Cluster-6 Economy-Class Income, Lowest Spending, High Fluctuations in Age & Spending Age ↑ β†’ Spending ↓; Income ↑ β†’ Spending ↑; Income & Age uncorrelated Low-cost offerings; monitor for cost sensitivity; stabilize fluctuations
Cluster-7 Youngest (Median Age = 23.5), Economy-Class Income, Very High Spending Income vs Spending shows negative (–ve) correlation Target for youth-focused campaigns; leverage impulse behavior via digital channels

Business Recommendations

Strategy Area Recommendation
Segmented Marketing Customize messages per cluster (e.g., luxury to Cluster-4, budget to Cluster-6)
Product Offering Use subscription models, exclusive bundles for high-spenders (Cluster-2, 4, 7)
Customer Lifecycle Engage Cluster-7/2 early for long-term loyalty; retain Cluster-1 with age-appropriate services
Risk Minimization Monitor high fluctuation clusters (4, 6) for churn or income-spending shifts
Data-Driven Personalization Use log-log regression insight: 1% income ↑ β‡’ 2.42% spending ↑; micro-target income tiers

⚠️ Check Out Analysis and Reporting Notebook

Note: All recommendations are grounded in the observed descriptive statistical and correlation-based cluster insights. As far as hypothesis testing is concerned, only data related to Cluster-4 is found to be statistically significant.

Practical Usecase

  • Let's assume, we have the data: Age=25, Income=60k$, Spending Score=55
  • How can we assign a cluster to the customer (new-observation) ??

⚠️ Check Out Usecase-Prediction Notebook

🌐 Try It (Minimal User Interface)

Getting Started

Clone this repository to your local machine :

git clone https://github.com/pb319/Segment-Stream.git

Set Up a Virtual Environment :

python3 -m venv env
source env/bin/activate  # For Linux/macOS
python -m venv env
env\Scripts\activate   # For Windows

Install Dependencies :

pip install jupyter notebook
jupyter notebook #running jupyter notebook

Future Scope of Work:

  • With the Data in Hand (200 Training Examples/ Sample Size) apply other clustering algorithms like, DBSCAN, Hierarchical Clustering, Gaussian Mixture Model and Compare Using Clustering Evaluation Metrics (Dunn Index, Silhouette Coefficient, etc.)
  • Running Anomaly Detection techniques (Isolation Forest, Rule Based) would bring out more insights.
  • Consider taking a relatively higher dimensional dataset. Dimensional Reduction techniques (PCA, t-SNE) would help deal with visualization of higher dimensional data.
  • Extend the current Cross Sectional data-analysis with Longitudinal / Panel-Data using Time Series Analysis.

Reference/Citations:

  • Alves Gomes, M., & Meisen, T. (2023). A review on customer segmentation methods for personalized customer targeting in e-commerce use cases. Information Systems and e-Business Management, 21(3), 527-570.

  • Kim, S. Y., Jung, T. S., Suh, E. H., & Hwang, H. S. (2006). Customer segmentation and strategy development based on customer lifetime value: A case study. Expert systems with applications, 31(1), 101-107.

About

A repository dedicated to the analysis of customer segmentation. This project aims to implement and evaluate various segmentation methodologies, drawing inspiration and techniques from current research in the field.

Topics

Resources

License

Stars

Watchers

Forks