Digital Compliance | Google Analytics | Web Analytics

Solving one of the biggest issues in Google Analytics 4: Data Retention

We recently released a blog post where we highlighted some of the things we have learnt over the last year since Google Analytics 4 was officially released. GA4 has a number of really great selling points. However, there are some features it is still missing as well as one or two issues we have with its existing functionality. In our post, we focused on one of these namely the relatively short retention period for some data within the system.

Shorter retention period in GA4

As mentioned, the issue we have revolves around the shorter retention period of specific data within GA4. This does not affect aggregated data you see in the standard reports. However, this does affect any event and user level that you might want to analyze in the Explorations section of GA4 or in a reporting suite like Google Data Studio.

By default, the data retention period is set to only 2 months. For some demographic data – like age, gender, and interest – this is the maximum retention period. For event and other user data you can pick between the 2 month default and a maximum of 14 months.

This means that if you want to perform an analysis of data older than 14 months you may not be able to do this with GA4. An example here could be the preparation for your annual Summer Sale. You may wish to go back and look at the performance of your advertising for the sale over the last few years. This would not really be possible beyond the basic reports that are supplied in the GA4 UI. This is not ideal as there may be vital insights to be gained from comparing the last few years of campaigning data.

Another example here may be that you suspect that different age groups are more active on your site at different times of the year. For example, those in their early 20’s may be visiting your travel site more as the tertiary school year is coming to an end. This may help you prioritize which travel packages you are showing on the homepage of your site. With GA4 you would not be able to do anymore analysis on this trend beyond what’s in the standard reports. Even looking at the end of the previous school semester to get some inkling may not be possible as the data will only be available for the last 2 months.

Of course, there are some good reasons to delete any data that you do not need and that no longer serves a legitimate business interest as this is a provision of GDPR and similar legislations. However, GA4 makes this 14 month window the maximum and this not gonna be the case in many instances. As part of being compliant with various legislations and to protect user’s privacy make sure you have permission from them to collect the data and that you clearly lay out in your privacy policy why you are collecting their data, how long you are storing it for, and what is being done with it.

How to increase the retention period?

There are two mitigating steps you can take to ensure you have the GA4 data available to you to analyse. The first is a fairly straightforward step and this is to change your data retention period in GA4 from 2 months to 14 months.

You can do this by following these steps:

  1. Navigate to your GA4 property.
  2. Click on Admin at the bottom of the left-hand menu bar
  3. In the Property column click on Data Settings
  4. Select Data Retention from the new menu.
  5. Change the Event Data Retention dropdown from 2 Months to 14 Months
  6. Having the box below checked will ensure that the retention period for user data resets each time the user visits. This means that if a user visits your site a month after their first visit their user specific data will now only be removed 14 months from the second visit not the first.

Connect Google BigQuery

The second step is a more complicated one but it allows you to ensure that you are able to have all your event data stored beyond its 14 month expiry period. This is to take advantage of a feature available in GA4 that was previously only available on Google Analytics 360, the premium enterprise version of Google Analytics. This feature is an automated link with Google BigQuery. BigQuery is Google’s cloud based data warehousing system. It allows you to store and analyze large quantities of data.

The setup and management of BigQuery and how to analyze and retrieve data is beyond the scope of this post but we can take you through some of the advantages and what to expect.

BigQuery is a fairly technical offering and involves navigating the large and at times confusing Google Cloud Platform and so for the most part some technical expertise is required to get it setup and running. More specific knowledge is required to extract data from it using SQL. But if you have someone with requisite knowledge then it is a very robust and effective system to use.

BigQuery is not a free offering and you will pay for putting data into the system, for the amount you have stored, and for extracting data from the system. Each of these three payment actions will depend on a number of factors including how you are putting the data in, how you are taking it out and on which Google servers you are storing it. The number of website visitors you have, how much of their activity you are tracking, how immediate you need the data to be available as well as how much of it you are extracting and how often will only figure into the cost of the system.

In terms of using the data you have stored in BigQuery it can be connected to Google Data Studio. This means you can analyze the data in a comfortable environment similar to other reports you may have set up there. You can actually have BigQuery data as the primary datasource you use for analyzing GA4 data as it will always have, at worst, data up to the previous completed day and all of your event data is available. The BigQuery datasource is not identical to the GA4 one and so you will need to adapt any reports you already have to the new source. Connecting directly to BigQuery rather than GA4 for your most recent data means that as time goes by you will always be able to analyze your historical data alongside your new data even after older data has expired in GA4 itself.

Lastly, BigQuery can actually assist to a certain degree with ensuring you are compliant with GDPR and other legislations by allowing you to specify where your data is stored. Google has data stores all around the world and you select exactly which datastore you want your BigQuery instance to be. This means that, taking GDPR or the Swiss DPA into account, you are able to store your data in Europe and not have to worry about it being stored in a country that doesn’t have the equivalent data protections. Several locations throughout Europe and Switzerland can be selected, each one with a slightly different pricing structure.

Let us know if you have questions regarding the setup of Google’s BigQuery, we’re happy to support.

Don't miss our Web Analytics Articles

Do you need support setting up GA4?