Regular Expressions in Google Analytics 4

Regular Expressions in Google Analytics 4: What You Need to Know

Regular expressions, commonly known as RegEx (RegExp), are a powerful tool for searching and manipulating text data. The best way to explain what a regular expression is is probably the following statement: a regular expression is a sequence of characters that forms a search pattern in any given text (string). And apart from being difficult to describe (lol), regular expressions can be used to match and manipulate text data in numerous programming languages and tools, including Google Analytics 4.

A regular expression is a sequence of characters that forms a search pattern in any given text (string)

The core idea behind regular expressions is to define a search pattern that matches a specific set of characters or words in a text string and as long as the string follows some sort of pattern, regex is robust enough to be able to capture this pattern and return a specific part of the string.

Some common uses of regular expressions include:

  • Validating input: You can use regular expressions to make sure that user input matches a particular pattern, such as an email address, phone number, or password. Example: An email address usually takes the form of “[email protected]“. To match any email address that conforms to this pattern, I may use the regular expression pattern (\w\.?)+@[\w\.-]+\.\w{2,}. 
  • Searching and replacing: Very often I use the power of RegEx to find and replace specific text within a document, web page or files in Sublime text. For example this regex (\d{3})-(\d{2})-(\d{4}) matches any string in the format of a social security number, with three digits, followed by two digits, followed by four digits.
  • Data extraction: You can use a regex to extract specific data from a larger body of text, such as a log file or database. For example in a database containing a vast amount of textual data, you can employ regular expressions to extract only the relevant data sets for your analysis or project, thus saving time and effort compared to manually sorting through the entire data set.

Regular expressions allow for the use of quantifiers, which specify how many times a pattern should be matched. For example, the asterisk (*) quantifier means ‘zero or more’ while the plus sign (+) means ‘one or more”.

Using regular expressions in Google Analytics 4 (GA4)

In Google Analytics 4 (GA4), regular expression support is not as widely available as it was in Universal Analytics. However, there are still places where they can be used. In this article, we’ll explore where regular expressions can be used in Google Analytics 4 and highlight some nuances that you should be aware of.

Exclude referral traffic in GA4 using Regex 

One place where regular expressions can be used in Google Analytics 4 is to define unwanted referrals. To exclude specific referral traffic from your GA4 reports, you should do the following steps:

  1. Go to the admin panel of your Google Analytics 4 account.
  2. Select your Data stream.
  3. Select your Web Data Stream and then click on “Configure tag settings.”
  4. Inside the tag settings click on “Show all” and then click “List the unwanted referrals”. 
Exclude referral traffic in GA4
Exclude referral traffic in GA4

If you want to exclude multiple domains, you can either add multiple conditions using the ‘Add condition’ button, or you can have one condition matching different domains by switching to “Referral domain matches RegEx” and entering your condition.

Before doing that I want you to remember that when using regular expressions in Google Analytics 4, they are looking for exact matches. In other words, the default regex in GA4 is a “full match”: the data must match the pattern you provide exactly. 

For instance, if you use the pattern “Canada” it will only match data that exactly contains “Canada”. If you want to perform a partial match, you can use metacharacters in your regular expression. For example, using the pattern “Canada.*” will match any value that starts with “Canada” and ends with anything (or nothing) else.

Going back to defining unwanted referrals if you want to exclude domains – for example, www.faceboook.com and www.youtube.com – that have the same subdomain, such as www, you can use a pipe (|’) to separate the domains, and the dot (.) can be used as a wildcard. 

www.faceboook.com|www.youtube.com
Exclude referral traffic in GA4 using Regex
Exclude referral traffic in GA4 using Regex

If you want to match the actual character “.” and not use it as a wildcard, enter the backslash (\) before the dot. 

www\.faceboook\.com|www\.youtube\.com
Exclude referral traffic in GA4 using Regex
Exclude referral traffic in GA4 using Regex

If you write your condition like the example above the regular expressions will be looking for www.faceboook.com or www.youtube.com. On the other side if you write the condition like faceboook\.com|youtube\.com the regular expression will not be including the www subdomains as in GA4 regular expressions are looking for the “full match”. 

Additionally, if do want to include all subdomains, you can use the regular expression (.*) (dot asterisk). This expression means, ‘match zero or more of any character”. Or in our case, the regular expression will include the www subdomains. 

So if you are looking to include all subdomains (partial match) you can write something like this

.*(faceboook\.com|youtube\.com) 

In this case, the referring domain must end with facebook.com or youtube.com but before them, there could be a subdomain like www or something else. 

Exclude multi language referral traffic in GA4 using Regex 

When dealing with a website that has multiple languages, the use of regular expressions is almost unavoidable. This is because the same third-party provider may use different domains for each language. 

For example, if you have a multilingual e-commerce site and you want to exclude PayPal transactions from your Google Analytics 4 property, you may need to use regular expressions to match all the possible domains that PayPal may use for each language.

paypal\.(com|[a-z]{2})
Exclude multi language referral traffic in GA4 using Regex
Exclude multi language referral traffic in GA4 using Regex

This regular expression will match any domain that starts with “paypal.” followed by either “com” or any two-letter lowercase code for a language (e.g., “es” for Spanish, “de” for German, etc.).

Another way to write the regular expression to match PayPal domains for multiple languages is:

paypal\.(com|es|en)
Exclude multi language referral traffic in GA4 using Regex
Exclude multi language referral traffic in GA4 using Regex

That regular expression would match the domain names “paypal.com”, “paypal.es”, and “paypal.en”.

Exclude internal traffic in GA4 using Regex 

When setting up a new Google Analytics 4 property the first thing you should be doing is to set up filters and exclude internal and developer traffic. Failing to properly exclude internal traffic from your tracking can result in skewed data that includes your own website visits or app sessions, as well as those of your team. This can lead to unreliable data that should not be used to make important decisions.

Failing to properly exclude internal traffic from your tracking can result in skewed data.
Failing to properly exclude internal traffic from your tracking can result in skewed data.

To exclude specific referral traffic from your GA4 reports, you should do the following steps:

  1. Go to the admin panel of your Google Analytics 4 account.
  2. Select your Data stream, then select your Web Data Stream.
  3. Click on “Configure tag settings.”
  4. Inside the tag settings click on “Show all” and then click “Define internal traffic”.
  5. Finally click on “Create’ Internal traffic rules and then select “IP address matches regular expression”.

Here similar to the previous example of excluding referral traffic you can use a regular expression to add different IP addresses instead of adding multiple conditions.

192.168.123.(132|134|136)
Exclude internal traffic in GA4 using Regex
Exclude internal traffic in GA4 using Regex

This regular expression will match IP addresses 192.168.123.132, 192.168.123.134, and 192.168.123.136.

Of course, you can also write the same regular expression as

192\.168\.123\.(132|134|136)
Exclude internal traffic in Google Analytics 4
Exclude internal traffic in Google Analytics 4

Use RegEx to create audiences and segments in GA4 explore reports

In explorations, regular expressions can be used to create filters. For example, you can create a filter where the event name matches a regular expression. 

Explorations within Google Analytics 4 refer to a set of advanced methodologies that go beyond regular reports and enable you to delve deeper into your customers’ behaviour, thereby revealing valuable insights.

Let’s start by creating a simple report where we use Event name as a dimension and Event count as a metric.

Free Form Report in GA4
Free Form Report in GA4

Now imagine that we need to create a report where we only want to concentrate on page_view and first_visit. In this case, we can create a filter that “matches regex” 

page_view|first_visit
Use RegEx to create audiences and segments in GA4 explore reports
Use RegEx to create audiences and segments in GA4 explore reports

After clicking ‘Apply’ the regex above will result in displaying only page_view and first_visit.

Use RegEx to create audiences and segments in GA4 explore reports
Use RegEx to create audiences and segments in GA4 explore reports

If you’re looking for partial matches, use the “.*” regular expression to match anything. Remember that regular expressions in Google Analytics 4 are looking for exact matches unless you use the wildcard feature.

(page|first).*
Use RegEx to create audiences and segments in GA4 explore reports
Use RegEx to create audiences and segments in GA4 explore reports

Use RegEx to create segments in GA4 explore reports

Another place to use regular expressions in the GA4 Exploration section is when creating segments.

GA4 explorations segments
GA4 explorations segments

Go to Segments and click on the plus icon, then select a segment type. Now, select the dimension. For example, if you want to analyse sessions coming from a particular country, first click on the Sessions segment, then “Geography” followed by “Country ID.” In the filter, switch to “matches regex” and enter the country ID.

GB|US|CA
Use RegEx to create segments in GA4 explore reports
Use RegEx to create segments in GA4 explore reports

Of course, you can also use the ‘OR’ button but I personally find it easier to configure the filter using RegEx. 

Build Custom Events in GA4 Using RegEx

A very good way to utilise regular expressions in Google Analytics 4 is to create new, custom events based on the data you send, which can then be used for conversions.

Did you know you can create custom events in GA4 without using Google Tag Manager?
Did you know you can create custom events in GA4 without using Google Tag Manager?

To create a custom event in GA4, you should do the following steps:

  1. Go to the admin panel of your Google Analytics 4 account.
  2. Select Events, then select Create Event.
  3. Inside of the Create Event section click on the button ‘Create”.
Build Custom Events in GA4
Build Custom Events in GA4

So following the logic above if you want to create a custom event whenever someone visits a specific page on your website, for example my Google Analytics 4 category page at https://omisido.com/category/google-analytics-4/, you can use the GA4 page_view event and set the page_location to match the regular expression “https://omisido\.com\/category\/google-analytics-4\/”. 

https://omisido\.com\/category\/google-analytics-4\/

By doing this, GA4 will generate a new event every time someone visits this page, allowing me to track and analyse this specific activity separately.

Use Regular Expressions in Google Analytics 4 to build custom events and conversions
Use Regex in GA4 to build custom events and conversions

Note: To validate that your custom event implementation is working correctly use the DebugView in Google Analytics 4. DebugView is a feature in Google Analytics 4 that allows you to view real-time events and parameters being sent to Google Analytics from your website or app. 

As you can see from the picture below the custom even I have created earlier ‘ga4_categry_page_visits’ is working as expected. To enable the DebugView in GA4 I am using the GA debugger Chrome extension.

DebugView in Google Analytics 4
DebugView in Google Analytics 4

When you enable DebugView, you can see events and parameters such as page views, clicks, sessions, and custom events being sent to Google Analytics in real-time. This can help you verify that your tracking is correctly configured and ensure that you are collecting the data you need to make informed decisions.

Conclusion

Regular expressions can be a powerful tool in Google Analytics 4, but they come with some nuances that you should be aware of. Use regular expressions to define unwanted referrals, internal traffic, create filters in explorations, and modify or create events in the admin panel. Remember that regular expressions in Google Analytics 4 are looking for exact matches unless you use the wildcard feature.

Events in Google Analytics 4

Comments

7 responses to “Regular Expressions in Google Analytics 4: What You Need to Know”

  1. […] Regular Expressions in Google Analytics 4: What You Need to Know […]

  2. […] benefits of AMP is faster page load times (page speed is a ranking factor). This can lead to lower bounce rates, as users are less likely to abandon a page that loads instantaneously. It can also improve search […]

  3. […] optimisation also improves the user experience, reducing bounce rates and improving the time users spend on your website. By prioritising mobile optimisation in […]

  4. […] Before we dive into the bad stuff there is one thing to consider – The difference between Exit rate and Bounce rate. […]

  5. […] and Bounce Rate by Social Network. Driving traffic is one thing, but making people stay is a completely different […]

  6. […] Data from benchmarking report is available for every value of the following dimensions: all Default channel definitions, Location and Device.The following metrics can be used for comparison of your data against benchmarks: Sessions, % New Sessions, New Sessions, Pages / Session, Avg. Session Duration and Bounce Rate. […]

  7. […] Note: To target complex URLs build custom events in GA4 using RegEx. […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.