How to Find Outliers in Regression Analysis in Excel

Cody Schneider8 min read

An odd data point lurking in your spreadsheet can quietly derail your entire regression analysis in Excel. These outliers can pull your trendline in the wrong direction, leading to misleading conclusions about the relationship between your variables. This tutorial will show you exactly how to find and assess these outliers using two simple but effective methods directly within Excel.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

What Are Outliers in Regression?

In the context of regression analysis, an outlier is a data point that has an extreme value for either the independent (X) or dependent (Y) variable, or both. Think of it as a point on your chart that lives far away from the main neighborhood where all the other data points are clustered.

But why should you care? Because the standard method for calculating a regression line (called "Ordinary Least Squares") is highly sensitive to these extreme values. An outlier has a strong "leveraging" effect, meaning it can pull the regression line towards itself. This can distort the slope and intercept of your line, giving you a model that doesn't accurately represent the true underlying relationship for the majority of your data.

Imagine you're trying to find the average home price in a neighborhood of ten houses that all cost around $300,000. If a new billion-dollar mansion is built, the "average" home price will skyrocket, but that number won't be a useful representation of the neighborhood. Outliers do the same thing to your regression model.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Before You Begin: Enable the Analysis ToolPak

Excel has a powerful, built-in statistics package called the Analysis ToolPak, but it isn't enabled by default. We'll need it for the more robust method of finding outliers. If you already have a "Data Analysis" button on your Data tab, you can skip this step.

Here’s how to turn it on:

  1. Click on File in the top-left corner, then select Options at the bottom of the left-hand menu.
  2. In the Excel Options window, click on Add-ins from the left-hand pane.
  3. At the bottom of the window, you'll see a dropdown menu next to "Manage." Make sure it says Excel Add-ins and click the Go... button.
  4. In the new dialog box, check the box next to Analysis ToolPak and click OK.

You should now see a Data Analysis button appear on the right side of the Data tab in your ribbon. Now you're ready to start hunting for outliers.

Method 1: Visual Inspection with a Scatter Plot

The fastest way to get a first look at potential outliers is to simply visualize your data. A scatter plot shows the relationship between your two variables and makes extreme points easy to spot with the naked eye.

Step 1: Create the Scatter Plot

Assuming your data is in two adjacent columns - with your independent variable (X) on the left and your dependent variable (Y) on the right - creating a plot is simple.

  • Select all your data in both columns, including the headers.
  • Go to the Insert tab on the Excel ribbon.
  • In the Charts group, click the icon that looks like a plot with dots on it (Insert Scatter (X, Y) or Bubble Chart).
  • From the dropdown, select the first option, the basic Scatter chart.

Excel will instantly generate a chart. You can already see if any points are floating far away from the main cluster of data.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

Step 2: Add a Trendline

To make outliers even more obvious, add a regression line (or "trendline") to your plot. This line represents the relationship your regression model calculates.

  • Right-click on any of the data points in your chart.
  • In the context menu, select Add Trendline...
  • A sidebar will appear. Ensure Linear is selected, and then you can close the sidebar.

With the trendline in place, an outlier will now be visibly distant from the line. Any point with a large vertical distance from the trendline has a high level of "error" and is a candidate for further investigation.

The takeaway: While this method is quick and intuitive, it's subjective. What looks like an outlier to you might not to someone else. It's a great starting point, but for a more definitive answer, we need a quantitative approach.

Method 2: Using Standardized Residuals

This is the statistical way to identify outliers with more confidence. Before we jump in, let’s define two simple terms:

  • Residual: This is just the "error" of a data point. It's the vertical distance between the actual data point (the real Y value) and the regression line (the predicted Y value). A large residual means the model didn't do a good job of predicting that point.
  • Standardized Residual: This takes the raw residuals and scales them so they are centered around 0 with a standard deviation of 1. By putting them all on the same scale, we can apply a simple rule of thumb to identify outliers consistently.

Step 1: Run the Regression Analysis

Now we’ll use the Analysis ToolPak to get our standardized residuals.

  1. Go to the Data tab and click on the Data Analysis button.
  2. In the dialog box, scroll down and select Regression, then click OK.
  3. A new window will appear. Fill in the following fields:

Step 2: Identify Outliers Using the Rule of Thumb

Excel will generate a new sheet with a lot of statistical outputs. Scroll down to the table labeled RESIDUAL OUTPUT. You'll see a column for "Predicted Y" and, right next to it, "Standardized Residuals."

Here's the common rule for interpreting these values:

  • Any data point with a standardized residual value less than -2 or greater than +2 is a potential outlier that deserves a closer look.
  • Any data point with a standardized residual value less than -3 or greater than +3 is almost certainly an outlier.

For large datasets, scrolling through the list can be cumbersome. Instead, use Excel's Conditional Formatting to have the outliers jump out at you.

  1. Highlight the entire "Standardized Residuals" column.
  2. Go to the Home tab.
  3. Click Conditional Formatting > Highlight Cells Rules > Greater Than...
  4. Enter 3 in the box and choose a fill color (like light red fill). Click OK.
  5. Repeat the process, this time selecting Highlight Cells Rules > Less Than...
  6. Enter -3 in the box and pick the same format.

Now, any values that violate our rule of thumb will be instantly highlighted in red. You can see exactly which observations are statistical outliers.

GraphedGraphed

Still Building Reports Manually?

Watch how growth teams are getting answers in seconds — not days.

Watch Graphed demo video

You Found an Outlier... Now What?

Finding an outlier is only the first step. The next, and more important, step is deciding what to do with it. Your worst instinct is often your first one: deleting it. Resist this urge! Casually removing data points can be a form of manipulation and might lead you to discard valuable information.

Instead, follow this process:

  1. Investigate It: Is the outlier the result of a mistake? Check for typos or data entry errors. Did someone accidentally type 500 when they meant 50.0? This is the most common and easily fixable cause. Was there an equipment malfunction or a special circumstance that makes that data point invalid?
  2. Correct It: If you find a clear error (like the typo example), correct it to the right value and re-run your analysis. Problem solved.
  3. Remove It (with caution): If you can confirm the data point is invalid (e.g., the measurement came from a broken sensor, or the survey respondent wasn't from the target demographic), then it is justifiable to remove it. Always document why you removed it.
  4. Keep It (and analyze its impact): If the outlier is strange but legitimate, it's giving you important information. It might represent a rare but real event. In this case, you can run your regression analysis twice: once with the outlier and once without it. You can then present both models and discuss how that single data point influences the results. This provides a much more honest and complete picture of your data.

Final Thoughts

Finding outliers is an essential check-up for any regression analysis. As we've seen, identifying them in Excel is straightforward using visual scatter plots for a quick look or standardized residuals for a more statistically sound judgment. Taking a moment to find and understand these odd points ensures your model is trustworthy and tells an accurate story.

This process of setting up data, running analyses, formatting results, and investigating anomalies is a core part of a data analyst's workflow, but it can quickly become repetitive and time-consuming, even in a user-friendly tool like Excel. When you’re dealing with data from multiple sources like Google Analytics, Shopify, and your CRM, this manual reporting can take up hours. We built Graphed to short-circuit this entire process. Instead of following these steps manually, you can connect your data sources once and use plain English to ask things like, "Find the relationship between my ad spend and sales, and show me any outliers." You get the insights in seconds, allowing you to focus on the "what to do" part of the process instead of the "how to build it" part.

Related Articles