How to Identify Bots in Google Analytics
Ever see a sudden, unexplained spike in your Google Analytics traffic that seems too good to be true? More often than not, it is. This kind of surge is frequently caused by bots, crawlers, and spam referrers, not real potential customers. This article will show you how to spot this fake traffic in Google Analytics 4 and filter it out so you can trust your data and make decisions based on what your actual users are doing.
Why You Should Care About Bot Traffic
You might think "more traffic is always better," but when it comes to bots, it's just noise that pollutes your data. Letting bot traffic run unchecked in your analytics can cause several serious problems:
- It skews your core metrics. Bots inflate session counts, user numbers, and pageviews, making your site's performance look different than it actually is. They distort crucial behavioral metrics like engagement rate and session duration, as they typically land on a single page for a split second and then leave. This gives you a misleading picture of how engaging your content really is.
- It leads to poor business decisions. Imagine you see a huge traffic spike from a specific country or a new referral source. You might decide to invest more marketing budget into that area, thinking you’ve found a new audience. If that traffic is 99% bots, you’re just throwing money away based on bad data.
- It can mess with your website's performance. While most analytics-spam bots don’t actually visit your site, some "real" bots and crawlers do. Excessive bot activity can put an unnecessary load on your server, potentially slowing down the experience for your real human visitors.
Common Signs of Bot Traffic in GA4
Identifying bot traffic isn't always obvious, but there are several dead giveaways if you know where to look. Here are the most common clues to watch out for in your GA4 property.
Clue #1: Suspicious Traffic Sources and Referrers
One of the easiest places to start hunting for bots is in your traffic acquisition reports. These reports show you where your visitors are coming from.
In GA4, navigate to Reports > Acquisition > Traffic acquisition. By default, this report is grouped by the 'Session default channel group.' To get more detail, change the primary dimension to 'Session source / medium'.
Now, look for anything that seems out of place:
- Spammy Referrers: Scan the list for URLs that look like pure spam. They often have names like "free-traffic-for-you.com" or "buttons-for-your-website.com." These are almost never legitimate sources of traffic.
- Unexplained Direct Traffic Spikes: While direct traffic itself is normal, a huge, sudden increase that doesn't correspond with any offline marketing, email campaigns, or public relations efforts can be a sign of bot activity.
- (not set) values: Large amounts of traffic listed as '(not set)' can sometimes be an indicator of improperly tagged campaigns, but they are also frequently associated with bot traffic or ghost spam that doesn't properly report its source.
Clue #2: Absurdly Good or Bad Engagement Metrics
Bots behave very differently from humans. They don't browse your content, watch videos, or fill out forms. This robotic behavior leaves a clear trail in your engagement metrics.
In your Traffic acquisition report, look at the columns for 'Engaged sessions' and 'Average engagement time'. GA4 considers a session "engaged" if it lasts longer than 10 seconds, has a conversion event, or includes at least two pageviews. Bots rarely meet any of these criteria.
Look for sources that have metrics like these:
- Engagement Rate near 0%: A traffic source sending hundreds of sessions with a 0.5% engagement rate is a huge red flag. It means almost no one from that source is interacting with your site in a meaningful way.
- Average Engagement Time of 1-2 Seconds: Humans take time to read and digest content. A source with thousands of sessions but an average engagement time of only a couple of seconds is almost certainly automated traffic.
Likewise, be wary of numbers that look "too perfect," such as a precise 100% engagement rate or an exact 1-minute average session duration across a large number of sessions. This can indicate a bot programmed with specific, unnatural behaviors.
Clue #3: Unusual Geographic Locations or Technical Data
Drilling down into the location and technical specifications of your users can also reveal bot patterns.
Geographic Data
Go to Reports > User > User attributes > Country. Ask yourself if the traffic patterns here make sense for your business. Do you sell products only in the United States but are seeing a massive traffic spike from a small city in Eastern Europe? Unless you’ve just launched an international campaign there, this is likely bot traffic.
Technical Data
Dive into Reports > Tech > Tech details. Here you can analyze traffic based on browser, device, operating system, and screen resolution. You can find more specific data by adding a secondary dimension like 'ISP Organization' to most reports. Keep an eye out for:
- Strange ISP Organizations: If you see a major portion of traffic coming from an ISP like "amazon-aws," "googlellc," or "microsoft-corp", it’s often from servers, not individual user computers.
- A "(not set)" Screen Resolution: The vast majority of real users will have a screen resolution logged by their browser. A large volume of traffic with a screen resolution that is '(not set)' is often indicative of headless browsers or bots that don't render a screen.
Clue #4: Investigating Hostnames
This is one of the most reliable ways to identify a specific type of spam called "ghost spam." Ghost spam works a bit differently - the bots never actually visit your website. Instead, they find your GA tracking ID (e.g., G-XXXXXXXXXX) and send fake "hits" directly to Google's servers. Because they never land on your site, they can’t know your actual domain name.
To check this in GA4, you'll need to use the 'Explore' section:
- Navigate to Explore in the left-hand menu and create a new Free form exploration.
- In the 'Variables' column on the left, click the '+' sign next to 'Dimensions', search for Hostname, check the box, and click 'Import'.
- In the 'Variables' column, click the '+' sign next to 'Metrics', search for Sessions, check the box, and click 'Import'.
- Drag 'Hostname' from the Variables column to the 'Rows' section in the 'Tab settings' column.
- Drag 'Sessions' from the Variables column to the 'Values' section in the 'Tab settings' column.
You’ll now see a table of all the hostnames that have sent data to your GA4 property, along with the number of sessions from each. The list should primarily contain your own website's domain (e.g., yourwebsite.com) and potentially subdomains or related service domains (like your payment gateway). If you see any unrelated hostnames in this list - especially spammy-looking ones - it’s 100% ghost spam.
Steps to Filter Bot Traffic from Your GA4 Reports
Now that you’ve found the bot traffic, it's time to get rid of it. Here’s how to clean up your GA4 data.
Start with GA4's Automated Filtering
The good news is that GA4 automatically filters most known bots and spiders for you. This functionality is enabled by default and uses Google's research and the International Spiders & Bots List to identify and exclude common automated traffic. However, this system isn't perfect and sophisticated or new spambots can still slip through, which is why manual filtering is still necessary.
Create Custom Data Filters for Persistent Bots
For the spam that bypasses automatic filters, you can create your own custom data filters. Let's say you identified a lot of bot traffic coming from a specific service provider like Amazon Web Services.
- Go to Admin (the gear icon in the bottom left).
- Under the 'Property' column, click on Data Settings > Data Filters.
- Click Create Filter.
- Choose the Traffic from specific sources template, and specifically you’ll use the Developer traffic as that’s an ‘exclude’ filter today.
- Give your filter a clear name, like "Exclude Hostinger Bot Traffic".
- Set the filter operation to Exclude.
- For the filter logic, select events where the parameter traffic_type matches the value is_not_one_of. And leave the Value blank. (This might be better if you set filter name to match ‘internal’ - this will also trigger exclusion.)
Note: Data filters do not apply retroactively. They will only filter traffic from the moment they are activated, so your historical data will remain unchanged.
Exclude Your Own Team's Traffic
Finally, remember to exclude traffic from your own team. While not malicious, your internal activity - testing pages, checking content, and demonstrating features - can easily inflate your metrics and skew your user behavior data. GA4 has a built-in tool for this.
- Navigate to Admin > Data Streams and click on your website's data stream.
- Under 'Google tag,' click on Configure tag settings.
- Click Show all, then select Define internal traffic.
- Click Create and give your rule a name (e.g., "Office IP"). Fill in the IP addresses or IP ranges of your offices, home networks, and agencies.
- Once you save this rule, GA4 will now flag any traffic from these IPs with a traffic_type parameter of "internal".
- To activate the exclusion, go back to Admin > Data Settings > Data Filters. You should see a pre-made filter named "Internal Traffic." Its state will be 'Testing'. Click the three dots on the right, select Activate filter, and confirm.
Final Thoughts
Keeping your Google Analytics data clean is not a one-time task, but an ongoing process. Regularly monitoring your traffic sources, engagement metrics, and technical data for suspicious activity is essential for accurate reporting. By using GA4's built-in features and creating custom filters, you can ensure the insights you gather reflect real user behavior, allowing you to make smarter decisions for your business.
We know that manually cleaning up your data and trying to stitch together reports across Google Analytics, your ad platforms, and your CRM can be a massive drain on your time. At Graphed, we built our platform to eliminate this friction entirely. By connecting your sources in seconds, you can use simple, plain-English commands to create real-time dashboards that automatically pull clean data. Instead of spending hours digging for bots, you can just ask for the insights you need and get back to growing your business.
Related Articles
What SEO Tools Work with Google Analytics?
Discover which SEO tools integrate seamlessly with Google Analytics to provide a comprehensive view of your site's performance. Optimize your SEO strategy now!
Looker Studio vs Metabase: Which BI Tool Actually Fits Your Team?
Looker Studio and Metabase both help you turn raw data into dashboards, but they take completely different approaches. This guide breaks down where each tool fits, what they are good at, and which one matches your actual workflow.
How to Create a Photo Album in Meta Business Suite
How to create a photo album in Meta Business Suite — step-by-step guide to organizing Facebook and Instagram photos into albums for your business page.