How to Use Regex in Looker Studio
Regular expressions, or regex, are a powerful way to find and manage patterns in your text data, and using them in Looker Studio can transform your static reports into dynamic analysis tools. Knowing how to use regex allows you to clean inconsistent data, create custom groupings, and build highly specific filters that aren’t possible with a standard dropdown. This tutorial will walk you through the practical regex functions in Looker Studio and provide step-by-step examples for you to follow.
What is Regex and Why Bother Using it in Looker Studio?
Think of regex as a super-powered find-and-replace tool. Instead of searching for simple text like "Facebook," you can search for a pattern. For example, you could write a single regex pattern to find "Facebook," "fb," "Facebook Ads," and "facebook.com" all at once.
In Looker Studio, this is incredibly useful for several common reporting challenges:
- Cleaning Messy Data: Your team might name things differently. One person uses "FB_Campaign," another uses "facebook-campaigns," and a third just uses "_facebook." Regex lets you group all of them under a single, clean "Facebook" channel.
- Categorizing Content: You can automatically group your web pages into categories like "Blog," "Products," or "Services" just by looking for patterns in the URL, like
/blog/or/products/. - Extracting Specific Information: Want to pull out just the coupon code from an order name string? Or find all landing pages that used a specific template name in the URL? Regex can extract just the text you need into a new dimension.
- Building Advanced Filters: Create sophisticated filters that allow users to select data based on multiple conditions at once, like finding all users from "New York" OR "California" with a single rule.
The Three Core Regex Functions in Looker Studio
Looker Studio provides three main functions for working with regular expressions. Mastering these three will allow you to handle nearly any text manipulation task.
1. REGEXP_MATCH: The Yes/No Test
This function checks if your text matches a given regex pattern and returns either true or false. It's perfect for creating boolean (true/false) fields, which are especially powerful for building filters and CASE statements.
How to Use It:
The syntax is REGEXP_MATCH(text, pattern).
Imagine you want to identify all campaigns related to your brand. Your campaign names might include "brand," "Brand," or "BRAND."
Create a calculated field with this formula:
REGEXP_MATCH(Campaign, '(?i)brand')Campaignis your dimension containing campaign names.'(?i)brand'is the regex pattern. The(?i)is a common flag that makes the pattern case-insensitive, so it matches "brand", "Brand", etc.
This formula will return true for campaign names like "US-Google-Brand-Search" and "FB_Brand_Awareness" and false for everything else.
2. REGEXP_EXTRACT: Pulling Out Specific Text
This is arguably the most powerful function of the three. It lets you pull out, or extract, a piece of text that matches a part of your regex pattern. This is how you create new dimensions from existing ones.
How to Use It:
The syntax is REGEXP_EXTRACT(text, pattern).
Let's say you want to classify your website content by its top-level directory. You have URLs in your Page Path dimension like /blog/10-marketing-tips/ and /products/new-widget/. You just want to extract "blog" or "products".
Use a calculated field with this formula:
REGEXP_EXTRACT(Page Path, '^/([^/]+)/')Page Pathis your dimension.'^/([^/]+)/'is the pattern. It might look complex, but let's break it down:
This formula will produce a new field containing "blog," "products," "services," etc., so you can easily view your metrics by content section.
3. REGEXP_REPLACE: Find and Replace Text
This function searches for text matching a pattern and replaces it with new text you provide. It's the ultimate clean-up tool.
How to Use It:
The syntax is REGEXP_REPLACE(text, pattern, replacement_text).
Imagine your GA4 data has referral traffic from various subdomains of the same site, like t.co, mobile.twitter.com, and twitter.com. You want to group them all as "Twitter."
Apply this formula in a calculated field:
REGEXP_REPLACE(Session source, '.*twitter.*|t\\.co', 'Twitter')Session sourceis your source dimension.'.*twitter.*|t\.co'is the pattern. The.*on either side of "twitter" means "any character, zero or more times" — essentially, if "twitter" appears anywhere in the source, it will match. The|acts as an "OR," so it also looks fort.co. Notice the double backslash\\: to match a literal dot, you have to "escape" it.'Twitter'is the text that all matching sources will be replaced with.
Must-Know Regex Characters (Metacharacters)
To write your own regex patterns, you need to know a few essential "metacharacters" — special characters that have a specific function.
.(Dot): Matches any single character except a newline.*(Asterisk): Matches the preceding character zero or more times.+(Plus sign): Matches the preceding character one or more times.?(Question mark): Makes the preceding character optional (matches it zero or one time).^(Caret): Matches the beginning of the string of text.$(Dollar sign): Matches the end of the string.()(Parentheses): Creates a capturing group for extraction withREGEXP_EXTRACT.[](Square Brackets): Matches any single character listed inside the brackets. For example,[abc]matches an 'a', a 'b', or a 'c'.|(Pipe): Acts as an OR operator.Google|Facebookmatches either "Google" or "Facebook".- **
\(Backslash):** "Escapes" a metacharacter, allowing you to match it literally. To search for a real period, you'd use\..
Practical Walkthroughs in Looker Studio
Let's apply these concepts to solve some common reporting problems.
Example 1: Consolidating Inconsistent Campaign Names
Your team uses different naming conventions, polluting your marketing channel reports. You have names like fb_retargeting, Facebook_TOF, google_Brand_Search, and GADS_Shopping. The goal is to clean this up into consistent channel names: "Facebook & Instagram", "Google Ads", etc.
- On your Looker Studio report canvas, click on your chart, and in the properties panel on the right, click Add a Dimension.
- Select Create field at the bottom of the list.
- Give your new field a name, like "Marketing Channel Grouping."
- Enter the following formula, which combines a
CASEstatement withREGEXP_MATCH:
CASE
WHEN REGEXP_MATCH(Campaign, '(?i)(facebook|fb|ig|instagram)') THEN 'Facebook & Instagram'
WHEN REGEXP_MATCH(Campaign, '(?i)(google|gads|adwords)') THEN 'Google Ads'
WHEN REGEXP_MATCH(Campaign, '(?i)linkedin') THEN 'LinkedIn Ads'
ELSE 'Other'
ENDClick "Apply." Now you can drag this new calculated field into your tables and charts to see clean, aggregated data grouped by the channel you've just defined.
Example 2: Analyzing Performance by Subdomain from Google Analytics Data
Let's say your business runs on multiple subdomains: app.yoursite.com for your main application, blog.yoursite.com for content, and store.yoursite.com for e-commerce. You want to see traffic and conversions broken down by these subdomains.
- Ensure you have the
Hostnamedimension available from your GA4 data source. - Click Add a Field in your data source or directly on a chart.
- Name the field "Subdomain."
- Use this
REGEXP_EXTRACTformula:
REGEXP_EXTRACT(Hostname, '^([^.]+)')This pattern looks for the beginning of the string (^) and extracts (()) one or more characters that are not a dot ([^.]+). The formula will create a new dimension containing only "app," "blog," and "store," perfect for analyzing performance across different parts of your site.
Timesaving Tips & Common Troubleshooting
Getting your regex just right can sometimes be tricky. Here are a few tips to avoid common issues.
- Test Your Patterns Before Using Them: Free online tools like Regex101.com are life-savers. You can paste in your sample data (like a list of campaign names or URLs) and adjust your regex pattern in real-time to see what it matches. This is much faster than editing a formula in Looker Studio over and over.
- Always Account for Case Sensitivity: Data is almost never consistently capitalized. Adding the case-insensitive flag
(?i)at the start of your regex pattern (e.g.,'(?i)google') will save you a lot of headaches. - Looker Studio Requires a Double Backslash for Escaping: This is a frequent point of confusion. In most regex testers, you use a single backslash
\to escape a special character. In Looker Studio formulas, you must use a double backslash\\. To matcha.b, your pattern in Looker Studio should be'a\\.b'.
Final Thoughts
Learning to use REGEXP_MATCH, REGEXP_EXTRACT, and REGEXP_REPLACE in Looker Studio is a small time investment that pays off enormously in reporting clarity and flexibility. It lets you automate data cleaning and create custom dimensions that align with how your business actually thinks about its performance metrics.
Manually cleaning data and wrestling with complicated formulas is precisely the kind of reporting friction that slows businesses down. Instead of creating complex regex rules, we believe you should be able to simply describe what you need. That's why we built Graphed to be your AI data analyst. You can just ask a question in plain English, like, "Group all campaigns containing 'fb' or 'facebook' together and show me their performance," and our tool will handle the data grouping, analysis, and visualization for you.
Related Articles
How to Enable Data Analysis in Excel
Enable Excel's hidden data analysis tools with our step-by-step guide. Uncover trends, make forecasts, and turn raw numbers into actionable insights today!
What SEO Tools Work with Google Analytics?
Discover which SEO tools integrate seamlessly with Google Analytics to provide a comprehensive view of your site's performance. Optimize your SEO strategy now!
Looker Studio vs Metabase: Which BI Tool Actually Fits Your Team?
Looker Studio and Metabase both help you turn raw data into dashboards, but they take completely different approaches. This guide breaks down where each tool fits, what they are good at, and which one matches your actual workflow.