How to Identify Bot Traffic in Google Analytics 4
Seeing your website traffic go up should be exciting, but when those numbers don't translate into sign-ups, sales, or actual engagement, you might be looking at automated bots, not new customers. This article will walk you through several practical methods to identify and investigate suspicious bot traffic in your Google Analytics 4 property so you can trust your data again.
Why You Should Tame the Bots
You might be tempted to ignore bot traffic, thinking more traffic is always good traffic. But automated scripts, spiders, and crawlers can quietly wreak havoc on your data, leading to poor decisions based on faulty information. Here’s why it’s a problem:
- Skewed Metrics: Bots behave unnaturally. A swarm of bots can drastically lower your average engagement time, create misleading user acquisition stats from strange locations, and make your important metrics unreliable.
- Polluted Conversion Data: The last thing you want are phantom "events" or "conversions" being triggered by bots. This pollutes your real data, making it impossible to know which marketing campaigns are actually working and which aren't.
- Wasted Ad Spend: If you're running PPC campaigns, some bots are sophisticated enough to click your ads. This is known as click fraud, and it can drain your advertising budget by serving ads to nonexistent users instead of real potential customers.
- Misleading Insights: When bots inflate your traffic, they create noise that masks the behavior of your real users. You might draw incorrect conclusions about which content is popular, where your users come from, or how they navigate your site.
Does GA4 Automatically Filter Bots? (Yes, But...)
Google Analytics 4 comes with a built-in feature designed to automatically filter traffic from known bots and spiders. This setting is enabled by default for all data streams and references the IAB/ABC International Spiders & Bots List to identify and exclude traffic from common, known crawlers (like search engine bots). This is a helpful first line of defense that works quietly in the background.
So, why are you still here? Because this automatic filter isn't foolproof. It’s effective against the “known bots” on a public list, but it can’t catch everything. Sophisticated custom-built bots, referral spam crawlers designed to drop links in your reports, and bots from compromised devices won't be on that list. This means you still need to do some manual detective work to keep your data clean from these more advanced intruders.
How to Manually Investigate Suspicious Traffic in GA4
Your goal is to find behavioral patterns that are distinctly non-human. This involves diving into your reports and looking for anomalies. Here are five investigative methods you can use today.
Method 1: Check for Unexpected Geographic Locations
Bot traffic often originates from data centers in countries or cities that are completely unrelated to your target audience. If you primarily sell products in the United States, a sudden surge of traffic from a small city in Eastern Europe should immediately raise a red flag.
How to check this:
- Navigate to Reports > User > User attributes > Demographics details.
- In the top-left of the chart, use the dropdown menu to change the primary dimension from "Country" to "City".
- Examine the list. Look for any locations that seem out of place.
- Pay close attention to metrics for these suspicious locations. A city sending thousands of new users with an average engagement time of 0:01 and zero conversions is a classic sign of bot activity.
This report gives you a quick way to spot outliers. When you find one, make a note of the specific country or city for potential filtering later.
Method 2: Put Your Traffic Sources Under the Microscope
"Referral spam" has been a problem for years. It happens when bots visit your site from a fake referral URL. The goal isn’t to harm your site directly, but to get you curious enough to see the URL in your analytics and visit their spammy (or malicious) website. These fake referrals are useless traffic that pollutes your acquisition data.
How to check this:
- Go to your traffic acquisition report: Reports > Acquisition > Traffic acquisition.
- The default primary dimension is "Session default channel group." Change it to "Session source / medium" or simply "Session source" to see the exact domains sending you traffic.
- Scan the list for odd, unprofessional, or nonsensical domains. URLs with phrases like "free," "buttons," "share," or a random assortment of letters and numbers are highly suspicious.
- Important: Do not visit these suspicious URLs directly. If you're curious, perform a Google search for the domain name in quotes (e.g., "dodgy-domain-example.com") to see if others have reported it as spam.
When you have a list of confirmed spammy domains, you can tell GA4 to ignore them moving forward, which we'll cover in the next section.
Method 3: Validate Your Hostnames
Ghost spam is a particularly tricky type of bot traffic because it never actually visits your site. Attackers send fake data (or "hits") directly to Google's measurement servers using stolen or randomly generated Measurement IDs ("G-XXXXXXX"). Because they're bypassing your site, the "hostname" (your actual website domain) is often missing or incorrect.
How to check this:
The best way to see hostnames is by creating a simple custom report in the Explore section.
- Go to the Explore tab on the left-hand menu and click on Blank report.
- In the "Dimensions" panel, click the "+" sign and search for and import "Hostname".
- In the "Metrics" panel, click the "+" sign and import "Sessions" and "Average engagement time".
- Drag "Hostname" from the Dimensions panel over to the "Rows" area.
- Drag "Sessions" and "Average engagement time" to the "Values" area.
Your report will display a list of all hostnames that have sent data to your GA4 property. You should see your own domain (e.g., "www.yourwebsite.com"). You might also see legitimate related domains like "yourwebsite.myshopify.com" or payment gateways. Any hostnames that are completely unfamiliar, not one of your subdomains, or show as "(not set)" are likely ghost spam.
Method 4: Spot Inhuman Behavior Patterns
Bots act like bots. They don't browse, read articles, or hesitate before clicking. Their behavior often leaves telltale signs of automation if you know where to look. They're often programmed to hit a single page and then leave, leading to sessions with almost no engagement.
Look for these red flags:
- Extremely Low Engagement Time: While a few real users might leave quickly, a large volume of traffic with an "Average engagement time" hovering around "00:00:00" or "00:00:01" is highly unnatural. This is a strong indicator of automated hits.
- 100% New Users with ~1 Session Per User: Bots are almost always treated as new users who visit once and never return. If a specific traffic source sends thousands of sessions composed entirely of new users, it’s worth investigating.
- Strange Pageviews: Check your Pages report (Reports > Engagement > Pages and screens). Do you see a lot of views on unusual URLs, like pages with strange parameters or just your homepage and nothing else from a suspicious source? Bot activity often sticks out here.
Method 5: Check the Service Provider Dimension
This is a slightly more advanced technique that can be very revealing. Most of your real visitors will come from common Internet Service Providers (ISPs) like "Comcast," "Verizon," or "AT&T." Bot traffic, however, often originates from commercial data centers and cloud hosting providers. GA4 will show these as the "Network Domain."
How to check this:
You can create another quick report in the Explore section:
- Create another new Blank exploration.
- Add the dimension "Network domain".
- Add the metrics "Sessions" and "Engaged sessions".
- Drag "Network domain" to Rows and the metrics to Values.
Now, scan the report. Look for names associated with cloud hosting, such as "amazon technologies inc," "microsoft corporation," "google llc," and "ovh sas." Seeing traffic from these providers isn't automatically bad, as some businesses use VPNs that route through them. However, if you see a huge number of completely unengaged sessions coming from a data center, it's almost certainly automated bots or crawlers.
Okay, I Found Bot Traffic. Now What?
Identifying bot traffic is the first step. The next is to take action to exclude it from your reports going forward. Unlike Universal Analytics, GA4 does not have view-level filters, so your options are a bit different.
Option 1: Add to the Unwanted Referrals List
This is your go-to solution for referral spam. Once you've identified spammy domains in your Traffic acquisition report, you can block them from appearing again.
- Navigate to Admin > Data Streams > [Select your web stream] > Configure tag settings > Show all.
- Click on List unwanted referrals.
- From here, you can add one or more domains that you want to exclude from your referral traffic data. Set the "Match type" to "Referral domain contains" and enter the spammy domain (e.g., "spam-domain.com").
Option 2: Filter by IP Address
If you've identified a persistent bot coming from a specific IP address (you may need server logs for this), you can exclude it using GA4's internal traffic filters. This feature is intended for filtering out company traffic, but it works for blocking any IP address.
- Go to Admin > Data Streams > [Select your web stream] > Configure tag settings > Show all.
- Click on Define internal traffic and create a new rule for the IP address you want to block.
- Once created, you must activate the filter. Go to Admin > Data Collection and Modification > Data Filters and click on the "Internal Traffic" filter. Change its state from "Testing" to "Active." This block will typically take effect within a few hours.
Option 3: Use Server-Side Protection
For large-scale or sophisticated bot problems, the best solution isn't within GA4 at all - it's at the server level. Services like Cloudflare offer Web Application Firewall (WAF) features that can block malicious bots before they ever reach your website and trigger your GA4 tag. This is the most robust solution for keeping your entire website safe and your analytics data clean.
Final Thoughts
Keeping your GA4 data clean requires a bit of ongoing vigilance. By regularly using the investigative methods in this guide - checking for unusual locations, spotting inhuman behavior, validating hostnames, and analyzing traffic sources - you can find and filter out the noise caused by bots. This ensures your reports reflect reality and empowers you to make smarter, data-driven decisions.
This process of building custom reports in GA4 Explore just to find simple anomalies can feel clunky and time-consuming. We wanted to make it faster. With Graphed you can connect your GA4 account and immediately start asking questions in plain English. Instead of manually building reports, you can just ask, "Show me traffic by hostname for last month," or "Which traffic sources have an average engagement time under 2 seconds?" We turn hours of analytics work into a 30-second conversation, helping you find and act on insights instantly.
Related Articles
What SEO Tools Work with Google Analytics?
Discover which SEO tools integrate seamlessly with Google Analytics to provide a comprehensive view of your site's performance. Optimize your SEO strategy now!
Looker Studio vs Metabase: Which BI Tool Actually Fits Your Team?
Looker Studio and Metabase both help you turn raw data into dashboards, but they take completely different approaches. This guide breaks down where each tool fits, what they are good at, and which one matches your actual workflow.
How to Create a Photo Album in Meta Business Suite
How to create a photo album in Meta Business Suite — step-by-step guide to organizing Facebook and Instagram photos into albums for your business page.