Unveiling the Power of Customer Segmentation

Ahmad Firdaus
6 min readAug 25, 2023

--

In the dynamic landscape of Business-to-Consumer (B2C) industries, the art of understanding customers goes beyond mere transactions — it’s about building relationships. Imagine a bustling mall, each visitor unique in their preferences and behaviours. Just as a skilled host anticipates the needs of their guests, businesses must anticipate and cater to the diverse needs of their customers. This is where customer segmentation steps in. By delving into the world of retail, we’ll explore the vital role of customer segmentation in B2C industries.

The Imperative of Customer Segmentation for Businesses

Every choice a company makes matters in the retail industry, where competition is stiff and client tastes are always changing. Here’s why client segmentation applied strategically is revolutionary:

a. Personalization with precision: General marketing tactics are insufficient. Businesses can adapt messaging and offerings to particular client groups by using customer segmentation. A more individualised approach encourages a stronger bond and has a greater impact.

b. Resource Allocation: Because resources are limited, efficient distribution is critical. Customer segmentation helps firms direct their efforts where they are most effective, maximising marketing resources and staff time.

c. Anticipating Trends: In a market driven by trends, understanding customer behaviors is essential. Segmentation uncovers patterns and trends within different customer groups, allowing businesses to stay ahead of market shifts.

Identifying convergent customer segments, often referred to as “Target Customers,” is the heart of effective segmentation.

The segments share common characteristics and hold significant potential for engagement. Here’s how to uncover them:

  1. Holistic Data Analysis: Dive into your collected data and apply clustering algorithms. These algorithms classify clients based on their commonalities, discovering previously unknown linkages between seemingly unconnected data sources.
  2. Behavioural Mapping: Don’t limit to demographics. Examine purchasing habits, buying histories, and interaction patterns. Even though their demographics differ, customers who display similar behaviours may belong to the same segment.
  3. Customer Engagement: Engage customers through surveys and feedback mechanisms. This qualitative data gives you insights into their preferences, allowing you to modify segments based on real-world observations.
  4. Segment Refinement: Segments aren’t static. As customers evolve, so should your segments. Continuously update and refine them to ensure they accurately represent your customer base.

Let’s practice the case study of the mall customers’ dataset

Dataset: Mall Customer Segmentation Data

  1. Create customer segmentation using a machine learning algorithm (K-Means Clustering) in Python
  2. Who are the target customers for focusing on your strategy?

Okay, let’s load the dataset

The dataset has some simple features, consisting of ‘CustomerID’, ‘Gender’, ‘Age’, ‘Annual Income (k$)’, and ‘Spending Score (1–100)’.

The dataset appears to have a normal distribution, with a modest difference between the mean and median values of each numerical attribute.

This data also demonstrates that clients are different; there are young, adult, and senior customers in terms of age, and the income range of customers is also extremely broad, ranging from $15,000 per year to $200,000.

let’s visualize the features

plt.figure(figsize=(12, 5))

plt.subplot(1, 3, 1)
plt.scatter(data['Annual Income (k$)'], data['Spending Score (1-100)'])
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.title('Annual Income vs Spending Score')

plt.subplot(1, 3, 2)
plt.scatter(data['Age'], data['Spending Score (1-100)'])
plt.xlabel('Age')
plt.ylabel('Spending Score (1-100)')
plt.title('Age vs Spending Score')

plt.subplot(1, 3, 3)
plt.scatter(data['Age'], data['Annual Income (k$)'])
plt.xlabel('Age')
plt.ylabel('Annual Income (k$)')
plt.title('Age vs Annual Income (k$)')


plt.tight_layout()
plt.show()

Because the distribution depicted on the scatter chart is so wide and it is difficult to discover patterns of client behaviour, in-depth analysis using machine learning is required.

In this example, a machine learning method (KMeans Clustering) will be used to construct segmentation on customer data in order to detect trends.

K-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centres or cluster centroid), serving as a prototype of the cluster.

You can visit the library from scikit-learn.org here

from sklearn.preprocessing import StandardScaler

# Select features
X = data[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']]

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Choosing the number of clusters using the Elbow Method
inertia = []
for k in range(1, 11):
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(X_scaled)
inertia.append(kmeans.inertia_)

# plot the inertia
plt.plot(range(1, 11), inertia, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Method')
plt.show()

I want to take 4 clusters for labelling the data, let’s continue:

# Fit KMeans model
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X_scaled)

# Assign clusters to data
data['Cluster'] = kmeans.labels_

# show the dataframe
data.head()

Here is the data after grouping per cluster:

As we can see, a huge number of customers are in Cluster 0, a total of 65 individuals, followed by Cluster 2, Cluster 1 and Cluster 3 with a total of 57, 40, and 38 individuals, respectively. We visualize the number to make it easier to understand the position of each cluster.

Total Income per Cluster

Total Spending Score per Cluster

Age per Cluster

Interpretation and recommendation:

Cluster 0 (Balanced Shoppers) is dominated by elderly people aged 45 to 68. These people are those who have already started families. They have relatively low annual incomes ranging from $40,000 to $60,000 and are the type of people who are frugal with their spending. Their average spending score is moderate. This cluster has median values for annual income and spending that are balanced. Customers in this cluster have an average income and spending habits. The means are also comparable, underscoring the group’s balanced nature. This cluster has a large number of subscribers, totalling 65 people.

Cluster 1 (High Spenders) is dominated by adults aged 30–35 years. Customers in this cluster had higher median and mean yearly income and spending scores. Even though their income fluctuates, they constantly exhibit high spending tendencies. This group is classified as heavy spenders. This cluster contains 40 consumers in total.

Cluster 2 (Budget Shoppers) is dominated by clients between the ages of 18 and 32. This cluster’s median yearly income is lower, indicating that its customers are frugal. The mean spending score is greater, indicating that some customers are ready to pay more, while the median spending score is lower. This variant indicates a variety of shopping behaviours that correspond to the description of budget shoppers. In this cluster, there are 57 clients in total.

And customers in Cluster 3 (Spontaneous Spenders) range in age from 25 to 50 years old. Customers in this cluster have a greater median yearly income, but a lower median expenditure score. Despite the lower median spending score, the mean spending score is greater, showing the presence of impulsive spenders who make high spending decisions on occasion. In this cluster, there are 38 clients in total.

In order for this consumer to continue spending on things, the company must provide high-quality service to cluster 1 or High Spender. Shoppers must be thoroughly examined in order to enhance their transactions via personal advice and keep their transactions going indefinitely. Cluster 0 or balanced consumer and cluster 3 or spontaneous spender must be thoroughly examined in order to boost conversion of their spending through cluster promotion/discount or matching item price.

you can visit my GitHub for completed calculations using python here

--

--

Ahmad Firdaus

Data science passionate about uncovering insights and solving complex problems. Background in mathematics from Kyushu Univ. Skilled in Python, SQL, Tableau.