How to Cluster Data in Excel

Finding patterns in your data can feel like looking for a needle in a haystack, but Excel's cluster analysis helps you find groups of similar things automatically. This powerful technique sorts your data into distinct clusters, making it easier to spot trends and target specific groups. This tutorial will walk you through what clustering is and how to perform it in Excel using several step-by-step methods.

Graphed

Build AI Agents for Marketing

Build virtual employees that run your go to market. Connect your data sources, deploy autonomous agents, and grow your company.

What Exactly is Cluster Analysis?

Cluster analysis is a technique used to group a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. Think of it as an automatic sorting machine for your data.

For example, imagine you run an online store. You could use cluster analysis to group your customers based on their purchasing habits. You might find one cluster of "high-spending recent buyers," another of "occasional bargain hunters," and a third of "loyal, long-term customers." Identifying these groups allows you to create targeted marketing campaigns instead of treating everyone the same.

The core idea is to minimize the distance between data points within a cluster while maximizing the distance between different clusters. In business, this helps with:

Customer Segmentation: Tailoring marketing efforts to specific customer groups.
Product Grouping: Identifying products frequently purchased together for cross-selling opportunities (market basket analysis).
Anomaly Detection: Pinpointing outliers that don’t fit into any group, which can be useful for fraud detection or identifying unique opportunities.
Data Organization: Simplifying massive datasets into a few understandable segments.

Preparing Your Data for Clustering

Before you can start clustering, you need to prepare your dataset. Clean and properly formatted data leads to meaningful results, while messy data just leads to confusion. Here’s what to do.

1. Format Your Data Correctly

Organize your data in a simple table. Each row should represent an individual item you want to cluster (like a customer), and each column should represent a feature or variable (like "age," "total spent," or "number of purchases"). Make sure all your data is numeric, as most clustering methods rely on calculating distances between numbers.

2. Handle Missing Values

Clustering algorithms can't work with blank cells. You have two main options:

Remove the row: If only a few rows have missing data, simply deleting them is the easiest solution.
Fill in the value: You can replace the blank cell with the average or median of that column. This is a good option when you don't want to lose the other data in that row.

3. Normalize Your Variables

This is an essential step that is often overlooked. If your variables use vastly different scales, the clustering algorithm will be biased towards the variable with the largest values. For instance, if you're clustering customers based on "age" (e.g., 25-65) and "annual income" (e.g., $30,000-$150,000), the income variable will completely dominate the analysis because its numbers are so much larger.

Normalization rescales all your variables to a common range, like 0 to 1, so each one has an equal impact. A common way to do this is called "Min-Max scaling." For each value in a column, you apply this formula:

Normalized Value = (Current Value - Minimum Value in Column) / (Maximum Value in Column - Minimum Value in Column)

In Excel, you could create new columns for your normalized data. If your original value is in cell A2, and your column's data is in A2:A100, the formula would be:

=(A2 - MIN($A$2:$A$100)) / (MAX($A$2:$A$100) - MIN($A$2:$A$100))

Drag this formula down for all rows to create your normalized dataset.

AI Agents for Marketing

Crash Course

Graphed

Free PDF · the crash course

AI Agents for Marketing Crash Course

Learn how to deploy AI marketing agents across your go-to-market — the best tools, prompts, and workflows to turn your data into autonomous execution without writing code.

Method 1: Simplified Visual Clustering with a Scatter Plot

The easiest way to get started with clustering is to manually eyeball the groups. While not a formal statistical method, it’s incredibly intuitive and perfect for datasets with two main variables.

Let's say you have data on customer purchase frequency and average order value. You want to find natural groupings like "High Value, Low Frequency" shoppers.

Create a Scatter Plot: Highlight your two data columns (e.g., 'Purchase Frequency' and 'Average Order Value'). Go to Insert > Charts > Scatter. You'll instantly see all your customers plotted as dots.
Visually Identify Clusters: Look for natural "clumps" of dots on your chart. Do you see a dense group in the top-right corner? Those are your high-frequency, high-value customers. Is there a group crowded at the bottom-left? Those might be your low-engagement customers.
Add a "Cluster" Column: Go back to your data table and add a new column named "Cluster."
Assign Labels: Based on the visual clumps you found, assign a label to each data point. You might use descriptive names like "VIPs," "Potential Loyalists," or "At-Risk Customers." This part is subjective but gets the job done quickly.
Analyze with a PivotTable: Now that you have labels, you can use a PivotTable to summarize the characteristics of each group you've created. This helps you understand things like the average spend for each cluster or the total number of customers in your "VIPs" group.

This method is simple, fast, and gives you a powerful way to segment your data without any complex formulas.

Method 2: Hierarchical Clustering using Conditional Formatting

Hierarchical clustering is a method that builds a tree of clusters. It doesn't require you to specify the number of clusters beforehand. We can simulate a simplified version of this in Excel using a distance matrix and conditional formatting.

This approach works best with a smaller number of data points (under ~30) but is great for understanding the logic behind how clusters are formed.

Step 1: Calculate the Distance Matrix

First, we need to calculate the "distance" between every pair of data points. A popular way to do this is using the Euclidean distance. For two data points with X and Y values, the formula is:

Distance = SQRT((X2-X1)^2 + (Y2-Y1)^2)

Let's say you have normalized data for 10 customers. You would create a 10x10 grid in Excel. In each cell of the grid, you'd apply the distance formula between the corresponding customers. The diagonal will be zero (since the distance from a customer to itself is zero).

Step 2: Apply Conditional Formatting

Now, highlight your entire distance matrix. Go to Home > Conditional Formatting > Color Scales and select a scale where the smallest values are one color (e.g., green) and the largest are another (e.g., red).

Your matrix will now visually reveal which pairs of customers are most similar — they'll be the ones with the darkest green cells. These are your first micro-clusters.

Step 3: Group the Closest Pairs

Identify the two most similar data points (the smallest distance). Group them together conceptually. Now, you’ll treat this pair as a single new point. You would then recalculate the distances between this new "group" and all the other points.

While doing the full hierarchical process manually in Excel is tedious, this visual approach brilliantly highlights foundational clusters based on your distance calculations.

Method 3: K-Means Clustering Using a VBA Macro

For a more advanced and automated approach, you can use K-Means clustering. This popular algorithm requires you to define 'K' — the number of clusters you want to find. Since Excel doesn't have a built-in K-Means function, you'll need to use a VBA (Visual Basic for Applications) macro.

Heads up: This involves code, but don't worry! You just need to copy and paste it.

Graphed

Build AI Agents for Marketing

Build virtual employees that run your go to market. Connect your data sources, deploy autonomous agents, and grow your company.

Step 1: Enable the Developer Tab

If you don't see a "Developer" tab in your Excel ribbon, go to File > Options > Customize Ribbon and check the box for "Developer."

Step 2: Open the VBA Editor

Go to the Developer tab and click Visual Basic, or press Alt + F11. This opens the VBA editor window.

Step 3: Insert a New Module

In the VBA editor, go to Insert > Module. A blank code window will appear.

Step 4: Paste the K-Means VBA Code

Copy the code below and paste it into the module window. This macro handles the entire K-Means process of assigning random centroids, assigning points to them, and recalculating until the clusters are stable.

Sub KMeansClusterAnalysis()
    Dim NumClusters As Integer, NumPoints As Integer
    Dim i As Integer, j As Integer, k As Integer
    Dim DataRange As Range, CentroidRange As Range
    Dim Point() As Double, Centroids() As Double, ClusterAssignment() As Integer
    Dim MinDist As Double, Dist As Double, BestCluster As Integer, O_Best_Cluster As Integer
    Dim MaxIterations As Integer, Iteration As Integer, Moved As Integer

    On Error GoTo ErrorHandler

    ' --- User Input ---
    Set DataRange = Application.InputBox("Select your data range (numeric columns only)", "K-Means: Select Data", Type:=8)
    NumClusters = Application.InputBox("Enter the number of clusters (K)", "K-Means: Enter K", Type:=1)
    MaxIterations = 100 ' Maximum number of iterations

    ' --- Initialize ----
    NumPoints = DataRange.Rows.Count
    NumFeatures = DataRange.Columns.Count
    ReDim Point(1 To NumPoints, 1 To NumFeatures)
    ReDim Centroids(1 To NumClusters, 1 To NumFeatures)
    ReDim ClusterAssignment(1 To NumPoints)

    ' Read data into array
    For i = 1 To NumPoints
        For j = 1 To NumFeatures
            Point(i, j) = DataRange.Cells(i, j).Value
        Next j
    Next i

    ' Initialize random centroids from data points
    Randomize
    For k = 1 To NumClusters
        RandomRow = Int((NumPoints * Rnd) + 1)
        For j = 1 To NumFeatures
            Centroids(k, j) = Point(RandomRow, j)
        Next j
    Next k

    ' --- K-Means Iteration ---
    For Iteration = 1 To MaxIterations
        Moved = 0
        ' Assignment Step
        For i = 1 To NumPoints
            MinDist = -1
            O_Best_Cluster = ClusterAssignment(i)
            For k = 1 To NumClusters
                Dist = 0
                For j = 1 To NumFeatures
                    Dist = Dist + (Point(i, j) - Centroids(k, j)) ^ 2
                Next j
                If MinDist = -1 Or Dist < MinDist Then
                    MinDist = Dist
                    BestCluster = k
                End If
              Next k
         If BestCluster <> O_Best_Cluster Then
              Moved = Moved + 1
          End If
            ClusterAssignment(i) = BestCluster
        Next i
        
        ' Update Step
        Dim ClusterSize() As Integer, Sums() As Double
        ReDim ClusterSize(1 To NumClusters)
        ReDim Sums(1 To NumClusters, 1 To NumFeatures)
        For i = 1 To NumPoints
            k = ClusterAssignment(i)
            ClusterSize(k) = ClusterSize(k) + 1
            For j = 1 To NumFeatures
                Sums(k, j) = Sums(k, j) + Point(i, j)
            Next j
        Next i
        're calculate centroid
        For k = 1 To NumClusters
            If ClusterSize(k) > 0 Then
                For j = 1 To NumFeatures
                    Centroids(k, j) = Sums(k, j) / ClusterSize(k)
                Next j
            End If
         Next k
         If Moved = 0 Then Exit For
    Next Iteration

    ' --- Output ---
    Application.ScreenUpdating = False
    Dim OutputColumn As Range
    Set OutputColumn = DataRange.Cells(1, DataRange.Columns.Count + 1)
    OutputColumn.Value = "Cluster" ' Set a header
    For i = 1 To NumPoints
        OutputColumn.Offset(i, 0).Value = ClusterAssignment(i)
    Next i
    Application.ScreenUpdating = True
    MsgBox "Clustering complete in " & Iteration & " Iterations. Cluster assignments added to the adjacent column."

    Exit Sub
ErrorHandler:
    MsgBox "An error occurred. Make sure you selected a valid numeric data range.", vbCritical
End Sub

AI Agents for Marketing

Crash Course

Graphed

Free PDF · the crash course

AI Agents for Marketing Crash Course

Learn how to deploy AI marketing agents across your go-to-market — the best tools, prompts, and workflows to turn your data into autonomous execution without writing code.

Step 5: Run the Macro

Close the VBA editor. Back in your Excel sheet, press Alt + F8 to open the macro dialog box. Select "KMeansClusterAnalysis" and click "Run."

The macro will prompt you to:

Select your (already normalized) data range, excluding headers.
Enter the number of clusters (K) you want.

After a few moments, a new column named "Cluster" will appear next to your data, containing a number (from 1 to K) assigned to each row. You've successfully clustered your data!

Visualizing Your New Clusters

Once you have a "Cluster" column, you can bring your findings to life with a color-coded scatter plot. This is the best way to see if your clusters make intuitive sense.

With your two variables and new Cluster ID column:

Create a Scatter Plot: Create a scatter plot just with the X and Y variables like before. It will appear with all dots in one color.
Isolate Each Cluster: You'll need to manually add a new data series for each cluster. Right-click the chart and choose "Select Data..."
Add a New Series: Click "Add". For the series name, you can just type "Cluster 1". For "Series X values," select the data range for just your first variable corresponding to Cluster 1. Do the same for "Series Y values."
Repeat: Repeat this process for Cluster 2, Cluster 3, etc.
Format and Label: Once all series are added, you can format each data series with a unique color. Now your chart clearly displays the different segments.

Final Thoughts

Using these clustering methods in Excel allows you to turn raw data tables into clear, actionable segments. Whether you’re manually grouping customers with a scatter plot or running an automated K-Means macro, you’re unlocking a deeper level of insight that can guide everything from marketing strategy to product development.

We know manually preparing data, running analyses, and updating reports in spreadsheets every week can be a major time-sink. That’s why we built Graphed. We wanted a way to connect all our sales and marketing sources - like Google Analytics, Shopify, and Salesforce - and get answers instantly using plain English. Instead of wrangling CSV files and building charts from scratch, you can ask powerful questions like, "Segment my Shopify customers by purchase frequency and total spend for the last 6 months" and get a live, interactive dashboard back in seconds.

How to Cluster Data in Excel

Build AI Agents for Marketing

What Exactly is Cluster Analysis?

Preparing Your Data for Clustering

1. Format Your Data Correctly

2. Handle Missing Values

3. Normalize Your Variables

AI Agents for Marketing Crash Course

Method 1: Simplified Visual Clustering with a Scatter Plot

Method 2: Hierarchical Clustering using Conditional Formatting

Step 1: Calculate the Distance Matrix

Step 2: Apply Conditional Formatting

Step 3: Group the Closest Pairs

Method 3: K-Means Clustering Using a VBA Macro

Build AI Agents for Marketing

Step 1: Enable the Developer Tab

Step 2: Open the VBA Editor

Step 3: Insert a New Module

Step 4: Paste the K-Means VBA Code

AI Agents for Marketing Crash Course

Step 5: Run the Macro

Visualizing Your New Clusters

Final Thoughts

Related Articles

Facebook Ads for Nail Salons: The Complete 2026 Strategy Guide

Facebook Ads for Locksmiths: The Complete 2026 Strategy Guide

Facebook Ads for Tutors: The Complete 2026 Strategy Guide