As most people already know, by default, the data retention period in Google Analytics 4 is set to only 2 months. For some demographic data – like age, gender, and interest – this is the maximum retention period. Read more about the details of data retention in GA4 in the following summary.
Understanding Data Retention in GA4
First of all, data retention settings do not affect the GA4 standard reports that use aggregated data such as the Audience Overview. The settings do affect exploration reports though, which are widely used since the standard reports in most cases do not suffice. Also the GA4 API is not impacted by the data retention settings and if the data is retrieved by the GA4 API, the data will be available without limitations. If you are working with GA4 only and do not use other tools that make use of the GA4 API such as Looker Studio, you will not be able to see data beyond the limitations explained in the next section within your exploration reports.
Data Retention options in GA4
For event and other user data you can pick between the 2 month default and a maximum of 14 months. In case you are using GA360, there are also options of 26, 38 or 50 months of data retention for event data. This means that after the period of time that you define from these options, the data in your exploration reports will be gone.
How to increase the retention period?
There are two mitigating steps you can take to ensure you have the GA4 data available to you to analyse. The first is a fairly straightforward step and this is to change your data retention period in GA4 from 2 months to 14 months.
You can do this by following these steps:
- Navigate to your GA4 property.
- Click on Admin at the bottom of the left-hand menu bar
- In the Property column click on Data Settings
- Select Data Retention from the new menu.
- Change the Event Data Retention and the User Data Retention dropdowns from 2 Months to 14 Months
- Having the box below checked will ensure that the retention period for user data resets each time the user visits. This means that if a user visits your site a month after their first visit their user specific data will now only be removed 14 months from the second visit not the first.
Connect Google BigQuery
The second step is a more complicated one but it allows you to ensure that you are able to have all your event data stored beyond its 14 month expiry period. This is to take advantage of a feature available in GA4 that was previously only available on Google Analytics 360, the premium enterprise version of Google Analytics. This feature is an automated link with Google BigQuery. BigQuery is Google’s cloud based data warehousing system. It allows you to store and analyze large quantities of data.
The setup and management of BigQuery and how to analyze and retrieve data is beyond the scope of this post but we can take you through some of the advantages and what to expect.
BigQuery is a fairly technical offering and involves navigating the large and at times confusing Google Cloud Platform and so for the most part some technical expertise is required to get it setup and running. More specific knowledge is required to extract data from it using SQL. But if you have someone with requisite knowledge then it is a very robust and effective system to use.
BigQuery is not a free offering and you will pay for putting data into the system, for the amount you have stored, and for extracting data from the system. Each of these three payment actions will depend on a number of factors including how you are putting the data in, how you are taking it out and on which Google servers you are storing it. The number of website visitors you have, how much of their activity you are tracking, how immediate you need the data to be available as well as how much of it you are extracting and how often will only figure into the cost of the system.
In terms of using the data you have stored in BigQuery it can be connected to Google Data Studio. This means you can analyze the data in a comfortable environment similar to other reports you may have set up there. You can actually have BigQuery data as the primary datasource you use for analyzing GA4 data as it will always have, at worst, data up to the previous completed day and all of your event data is available. The BigQuery datasource is not identical to the GA4 one and so you will need to adapt any reports you already have to the new source. Connecting directly to BigQuery rather than GA4 for your most recent data means that as time goes by you will always be able to analyze your historical data alongside your new data even after older data has expired in GA4 itself.
Lastly, BigQuery can actually assist to a certain degree with ensuring you are compliant with GDPR and other legislations by allowing you to specify where your data is stored. Google has data stores all around the world and you select exactly which datastore you want your BigQuery instance to be. This means that, taking GDPR or the Swiss DPA into account, you are able to store your data in Europe and not have to worry about it being stored in a country that doesn’t have the equivalent data protections. Several locations throughout Europe and Switzerland can be selected, each one with a slightly different pricing structure.
Let us know if you have questions regarding the setup of Google’s BigQuery, we’re happy to support.