Regular expressions, commonly known as RegEx (RegExp), are a powerful tool for searching and manipulating text data. The best way to explain what a regular expression is is probably the following statement: a regular expression is a sequence of characters that forms a search pattern in any given text (string). And apart from being difficult to describe (lol), regular expressions can be used to match and manipulate text data in numerous programming languages and tools, including Google Analytics 4.
A regular expression is a sequence of characters that forms a search pattern in any given text (string)
The core idea behind regular expressions is to define a search pattern that matches a specific set of characters or words in a text string and as long as the string follows some sort of pattern, regex is robust enough to be able to capture this pattern and return a specific part of the string.
Some common uses of regular expressions include:
- Validating input: You can use regular expressions to make sure that user input matches a particular pattern, such as an email address, phone number, or password. Example: An email address usually takes the form of “[email protected]“. To match any email address that conforms to this pattern, I may use the regular expression pattern (\w\.?)+@[\w\.-]+\.\w{2,}.
- Searching and replacing: Very often I use the power of RegEx to find and replace specific text within a document, web page or files in Sublime text. For example this regex (\d{3})-(\d{2})-(\d{4}) matches any string in the format of a social security number, with three digits, followed by two digits, followed by four digits.
- Data extraction: You can use a regex to extract specific data from a larger body of text, such as a log file or database. For example in a database containing a vast amount of textual data, you can employ regular expressions to extract only the relevant data sets for your analysis or project, thus saving time and effort compared to manually sorting through the entire data set.
Regular expressions allow for the use of quantifiers, which specify how many times a pattern should be matched. For example, the asterisk (*) quantifier means ‘zero or more’ while the plus sign (+) means ‘one or more”.
Using regular expressions in Google Analytics 4 (GA4)
In Google Analytics 4 (GA4), regular expression support is not as widely available as it was in Universal Analytics. However, there are still places where they can be used. In this article, we’ll explore where regular expressions can be used in Google Analytics 4 and highlight some nuances that you should be aware of.
Exclude referral traffic in GA4 using Regex
One place where regular expressions can be used in Google Analytics 4 is to define unwanted referrals. To exclude specific referral traffic from your GA4 reports, you should do the following steps:
- Go to the admin panel of your Google Analytics 4 account.
- Select your Data stream.
- Select your Web Data Stream and then click on “Configure tag settings.”
- Inside the tag settings click on “Show all” and then click “List the unwanted referrals”.
If you want to exclude multiple domains, you can either add multiple conditions using the ‘Add condition’ button, or you can have one condition matching different domains by switching to “Referral domain matches RegEx” and entering your condition.
Before doing that I want you to remember that when using regular expressions in Google Analytics 4, they are looking for exact matches. In other words, the default regex in GA4 is a “full match”: the data must match the pattern you provide exactly.
For instance, if you use the pattern “Canada” it will only match data that exactly contains “Canada”. If you want to perform a partial match, you can use metacharacters in your regular expression. For example, using the pattern “Canada.*” will match any value that starts with “Canada” and ends with anything (or nothing) else.
Going back to defining unwanted referrals if you want to exclude domains – for example, www.faceboook.com and www.youtube.com – that have the same subdomain, such as www, you can use a pipe (|’) to separate the domains, and the dot (.) can be used as a wildcard.
www.faceboook.com|www.youtube.com
If you want to match the actual character “.” and not use it as a wildcard, enter the backslash (\) before the dot.
www\.faceboook\.com|www\.youtube\.com
If you write your condition like the example above the regular expressions will be looking for www.faceboook.com or www.youtube.com. On the other side if you write the condition like faceboook\.com|youtube\.com the regular expression will not be including the www subdomains as in GA4 regular expressions are looking for the “full match”.
Additionally, if do want to include all subdomains, you can use the regular expression (.*) (dot asterisk). This expression means, ‘match zero or more of any character”. Or in our case, the regular expression will include the www subdomains.
So if you are looking to include all subdomains (partial match) you can write something like this
.*(faceboook\.com|youtube\.com)
In this case, the referring domain must end with facebook.com or youtube.com but before them, there could be a subdomain like www or something else.
Exclude multi language referral traffic in GA4 using Regex
When dealing with a website that has multiple languages, the use of regular expressions is almost unavoidable. This is because the same third-party provider may use different domains for each language.
For example, if you have a multilingual e-commerce site and you want to exclude PayPal transactions from your Google Analytics 4 property, you may need to use regular expressions to match all the possible domains that PayPal may use for each language.
paypal\.(com|[a-z]{2})
This regular expression will match any domain that starts with “paypal.” followed by either “com” or any two-letter lowercase code for a language (e.g., “es” for Spanish, “de” for German, etc.).
Another way to write the regular expression to match PayPal domains for multiple languages is:
paypal\.(com|es|en)
That regular expression would match the domain names “paypal.com”, “paypal.es”, and “paypal.en”.
Exclude internal traffic in GA4 using Regex
When setting up a new Google Analytics 4 property the first thing you should be doing is to set up filters and exclude internal and developer traffic. Failing to properly exclude internal traffic from your tracking can result in skewed data that includes your own website visits or app sessions, as well as those of your team. This can lead to unreliable data that should not be used to make important decisions.
To exclude specific referral traffic from your GA4 reports, you should do the following steps:
- Go to the admin panel of your Google Analytics 4 account.
- Select your Data stream, then select your Web Data Stream.
- Click on “Configure tag settings.”
- Inside the tag settings click on “Show all” and then click “Define internal traffic”.
- Finally click on “Create’ Internal traffic rules and then select “IP address matches regular expression”.
Here similar to the previous example of excluding referral traffic you can use a regular expression to add different IP addresses instead of adding multiple conditions.
192.168.123.(132|134|136)
This regular expression will match IP addresses 192.168.123.132, 192.168.123.134, and 192.168.123.136.
Of course, you can also write the same regular expression as
192\.168\.123\.(132|134|136)
Use RegEx to create audiences and segments in GA4 explore reports
In explorations, regular expressions can be used to create filters. For example, you can create a filter where the event name matches a regular expression.
Explorations within Google Analytics 4 refer to a set of advanced methodologies that go beyond regular reports and enable you to delve deeper into your customers’ behaviour, thereby revealing valuable insights.
Let’s start by creating a simple report where we use Event name as a dimension and Event count as a metric.
Now imagine that we need to create a report where we only want to concentrate on page_view and first_visit. In this case, we can create a filter that “matches regex”
page_view|first_visit
After clicking ‘Apply’ the regex above will result in displaying only page_view and first_visit.
If you’re looking for partial matches, use the “.*” regular expression to match anything. Remember that regular expressions in Google Analytics 4 are looking for exact matches unless you use the wildcard feature.
(page|first).*
Use RegEx to create segments in GA4 explore reports
Another place to use regular expressions in the GA4 Exploration section is when creating segments.
Go to Segments and click on the plus icon, then select a segment type. Now, select the dimension. For example, if you want to analyse sessions coming from a particular country, first click on the Sessions segment, then “Geography” followed by “Country ID.” In the filter, switch to “matches regex” and enter the country ID.
GB|US|CA
Of course, you can also use the ‘OR’ button but I personally find it easier to configure the filter using RegEx.
Build Custom Events in GA4 Using RegEx
A very good way to utilise regular expressions in Google Analytics 4 is to create new, custom events based on the data you send, which can then be used for conversions.
To create a custom event in GA4, you should do the following steps:
- Go to the admin panel of your Google Analytics 4 account.
- Select Events, then select Create Event.
- Inside of the Create Event section click on the button ‘Create”.
So following the logic above if you want to create a custom event whenever someone visits a specific page on your website, for example my Google Analytics 4 category page at https://omisido.com/category/google-analytics-4/, you can use the GA4 page_view event and set the page_location to match the regular expression “https://omisido\.com\/category\/google-analytics-4\/”.
https://omisido\.com\/category\/google-analytics-4\/
By doing this, GA4 will generate a new event every time someone visits this page, allowing me to track and analyse this specific activity separately.
Note: To validate that your custom event implementation is working correctly use the DebugView in Google Analytics 4. DebugView is a feature in Google Analytics 4 that allows you to view real-time events and parameters being sent to Google Analytics from your website or app.
As you can see from the picture below the custom even I have created earlier ‘ga4_categry_page_visits’ is working as expected. To enable the DebugView in GA4 I am using the GA debugger Chrome extension.
When you enable DebugView, you can see events and parameters such as page views, clicks, sessions, and custom events being sent to Google Analytics in real-time. This can help you verify that your tracking is correctly configured and ensure that you are collecting the data you need to make informed decisions.
Conclusion
Regular expressions can be a powerful tool in Google Analytics 4, but they come with some nuances that you should be aware of. Use regular expressions to define unwanted referrals, internal traffic, create filters in explorations, and modify or create events in the admin panel. Remember that regular expressions in Google Analytics 4 are looking for exact matches unless you use the wildcard feature.
Leave a Reply