Analysing Google Search Console Data with BigQuery

Analysing Google Search Console Data with BigQuery

In today’s digital landscape, data is king. Google Search Console (GSC) serves as a treasure trove of data, offering valuable insights into your website’s performance in the Google Search Engine Results Pages (SERPs). However, for in-depth analysis and uncovering hidden trends, integrating GSC data with BigQuery unlocks a whole new level of SEO (digital marketing) power.

Understanding Google Search Console

Google Search Console provides essential data about your website’s organic traffic and performance, including:

  • Search Analytics: Metrics like clicks, impressions, CTR (Click-Through Rate), and average position.
  • Index Coverage: Information about the indexing status of your website pages.
  • Mobile Usability: Issues affecting the mobile user experience.
  • Manual Actions: Notifications about manual penalties from Google.

While most GSC’s dashboards are useful, they lack advanced querying capabilities and may become challenging when handling large amounts of data. This is where exporting data to BigQuery becomes extremely beneficial.

The power of BigQuery

BigQuery, Google’s cloud-based data warehouse, provides a scalable and cost-effective platform for analysing large-scale datasets. By connecting Google Search Console with BigQuery, you can:

  • Unleash Granular Analysis: GSC offers a limited timeframe for data storage. BigQuery eliminates this restriction, allowing you to analyse historical trends spanning years, providing a more comprehensive understanding of your website’s organic search performance over time.
  • Craft Custom Reports: GSC’s reporting functionalities are powerful, but BigQuery reigns supreme for truly tailored insights. You can write complex SQL queries to create custom reports that answer your specific SEO questions. Imagine dissecting click-through rates (CTR) for specific device types or analysing keyword performance across different landing pages.
  • Combine Data Sources: SEO doesn’t exist in a vacuum. BigQuery allows you to easily integrate GSC data with other valuable sources like Google Analytics 4 (GA4) and Google Ads. This holistic view empowers you to understand how organic search traffic interacts with other marketing channels, providing a deeper insight into your overall SEO and digital marketing strategy.

Setting up Google Search Console and BigQuery integration

To start analysing Search Console data in BigQuery, you need to set up a data export. This process involves several steps:

  1. Enable BigQuery API: In the Google Cloud Console, ensure that the BigQuery API is enabled.
  2. Create a BigQuery Project: If you don’t already have one, create a new project in BigQuery.
  3. Link Google Search Console to BigQuery:
    • In the GSC dashboard, navigate to Settings > Associations > Add Association.
    • Choose BigQuery and follow the prompts to authenticate and link your BigQuery project.
  4. Configure Data Export: Define the export parameters, such as the data range and frequency (daily exports are common).

Once set up, GSC will automatically export your data to BigQuery based on the configured schedule.

Analysing Search Console data with BigQuery

With your data flowing into BigQuery, you can start performing advanced analyses. Here are some common use cases and queries:

Identifying top-performing pages

SELECT
  url,
  search_type,
  SUM(impressions) AS impressions,
  SUM(clicks) AS clicks,
  CAST(AVG(sum_position) AS INT64) AS average_position
FROM
   `your_project.your_dataset.url_impression` -- Use url_impression table
WHERE
  data_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 28 DAY)
  AND CURRENT_DATE()
GROUP BY
  url,
  search_type
ORDER BY
  clicks DESC
LIMIT
  100;

In this query:

  • CAST(AVG(sum_position) AS INT64) is used to convert the average position to an integer, effectively truncating any decimal places.
  • CURRENT_DATE() is used to get today’s date.
  • DATE_SUB(CURRENT_DATE(), INTERVAL 28 DAY) calculates the date 28 days ago from today.This ensures that the query always returns data for the most recent 28 days, regardless of when it’s run.

Query results

Identifying top-performing pages BigQuery example
Identifying top-performing pages BigQuery example

Note: Replace ‘your_project.your_dataset.url_impression’ with your url_impression table(!)

Device performance comparison

Comparing performance across devices (desktop, mobile, tablet) can reveal insights into user behaviour and device-specific issues.

SELECT
  device,
  SUM(clicks) AS total_clicks,
  SUM(impressions) AS total_impressions,
  CAST(AVG(sum_top_position) AS INT64) AS average_position
FROM
  `your_project.your_dataset.searchdata_site_impression`
WHERE
  data_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 28 DAY)
  AND CURRENT_DATE()
GROUP BY
  device
ORDER BY
  total_clicks DESC;

This query breaks down clicks, impressions, and average position by device type. By segmenting data by device, you can identify any discrepancies in performance between mobile and desktop users. This can inform decisions around mobile responsiveness and content optimisation for different screen sizes.

Query results

Device performance comparison BigQuery example
Device performance comparison BigQuery example

Identify queries containing your brand name

SELECT
  query,
  SUM(clicks) AS clicks,
  SUM(impressions) AS impressions
FROM
  `your_project.your_dataset.searchdata_url_impression`
WHERE
  query LIKE '%sido%'
GROUP BY
  query
ORDER BY
  clicks DESC

Query results

Identify queries containing your brand name BigQuery example
Identify queries containing your brand name BigQuery example

Find queries with high impressions but low clicks (potential for optimisation)

SELECT
  query,
  SUM(impressions) AS impressions,
  SUM(clicks) AS clicks,
FROM
  `your_project.your_dataset.searchdata_url_impression` -- Use url_impression table
WHERE
  data_date >= CURRENT_DATE() - INTERVAL 30 DAY
GROUP BY
  query
HAVING
  impressions >= 500 -- Adjust thresholds as needed
ORDER BY
  impressions DESC
LIMIT
  20

This query identifies queries with a high number of impressions (over 500 in the last 30 days) but a low number of clicks. These could be queries where your title tags or meta descriptions aren’t compelling enough, or they might not be well-aligned with user intent.

Query results

Find queries with high impressions but low clicks BigQuery example
Find queries with high impressions but low clicks BigQuery example

Visualising the data with Google Looker Studio

While BigQuery is excellent for querying and analysing data, visualising the results is often the best way to communicate insights. Google Looker Studio can connect directly to BigQuery and help you build dynamic dashboards.

Here’s how to create a basic Looker Studio dashboard:

  1. Go to Data Studio and select “Create a Data Source.”
  2. Choose BigQuery as the source and select your dataset.
  3. Build visualisations like line charts for clicks over time, bar charts for CTR by device, or tables for query performance.

Automating and Scaling

Once you’ve built your queries, you can schedule them to run at regular intervals. BigQuery allows you to schedule queries that will automatically update your datasets or reports. This is useful for keeping your dashboards fresh with the latest data without manual intervention.

Conclusion

Using BigQuery to analyse Search Console data opens up endless possibilities for deep analysis and actionable insights. With its ability to handle large datasets and run complex queries, you can uncover trends, opportunities, and areas for improvement much faster than using Google Search Console alone.

By following the steps outlined above, you’ll be well on your way to dissecting your Search Console data in BigQuery and using it to drive smarter, data-driven marketing decisions.

Analysing Search Console data with BigQuery
Analysing Search Console data with BigQuery
Events in Google Analytics 4

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.