When was the last time you had your Google Analytics implementation checked?
Do you trust your data? When was the last time your website changed, and do you know how this impacted the data you are looking at in Analytics?
Web analytics data is powerful but only if the data is accurate. We frequently run health checks on new and existing clients’ Google Analytics (GA) and Google Tag Manager (GTM) implementations to ensure that the data has integrity – meaning it can be trusted.
This guide will take you through the stages we go through in conducting a GA health check, giving you an understanding of all the elements you need to keep an eye on to ensure your GA setup is giving you the best quality data.
The guide covers the following areas:
You’ll need a reasonable understanding of Google Analytics in order to get the most of this guide, but even if you’re just starting to experiment with your Analytics setup, it will deepen your understanding of the tool and should help you identify issues and opportunities with your setup.
If you’re not sure about any of the terms or features mentioned in this guide, Google provides a useful glossary to help you navigate Analytics and Tag Manager.
Tracking Code Checks
Our Google Analytics health check always begins with looking at the tracking code and how it has been implemented.
As you can see above, Google provides very specific instructions on where each type of tracking code needs to sit in page code and you’re best off sticking to these guidelines to give your implementation the best possible starting point in terms of data integrity.
For example, the Google Tag Manager code consists of two code snippets. Your main GTM container code needs to go as high up in the <head> section of your code as it can so that the tags can start loading early in the page load. But we often find GTM containers implemented at the bottom of the code, which can result in the container not loading when users navigate through pages swiftly, meaning that the data from those sessions is not gathered accurately.
The same applies to GA implementations that do not use Google Tag Manager.
If you have Google Tag Manager in place, it’s important to check if the dataLayer is being used. We do this help us start to build a picture of whether we can trust the data.
A data layer is an object that you can add to your page code that contains all the information you want to pass to Google Tag Manager.
As a rule of thumb, Event tracking that uses the dataLayer tends to be a lot more robust than those that use other mechanisms, such as a click text trigger.
Next, it’s important to ensure GA / GTM have been implemented on every page of the website – essential for properly tracking and attributing all activity. We conduct this check on websites of all shapes and sizes, but if you have a small website, you can do this for free using www.gachecker.com.
Many of the data collection features in GA are set on property level and subsequently affect all views that sit within that property.
The free health check template contains additional information on each of the settings, but the below are the most important checks.
How many GA properties are in place and are good naming conventions being used to help you understand what is in each property? At this stage we typically identify some top line ways in which the structure can be improved, such as whether redundant properties can be deleted, rollup properties created, and naming conventions improved.
In the age of GDPR, it’s even more important to regularly review who has got access to what data in GA. Most users should only need ‘Read & Analyze’ access; we recommend keeping the number of users with full ‘Admin’ access to the bare minimum to eliminate the possibility of unwanted or unnecessary tinkering with the setup.
For accountability, each user should be on an individual login. Entire teams using the same login (firstname.lastname@example.org) is not ideal as changes to the data cannot be traced back to the individual.
Under GDPR, the collection of user data for the purpose of remarketing requires explicit consent. Demographic data in Google Analytics can be very powerful when segmenting your data but it will only be collected if ‘Advertising Reporting Features’ are enabled, too.
If you are seeing demographic data in GA, your website must have a consent mechanism in place that asks users to actively opt in to data collection for the purpose of remarketing or you may be in breach of GDPR which can be costly. Always consult a specialist if you are unsure.
More recently, Google introduced Google Signals (BETA). This collects additional data about your users which allows you to better understand cross-device behaviour for users who are logged in to other Google services on their device, such as Chrome, Gmail or YouTube. This is a powerful feature but you need to make sure that you are GDPR compliant if you enable it.
This reasonably new feature allows you to decide how long you will keep user-level data in GA. By default, Google sets this to 26 months. While this may be enough for many, you should consider what importance historic data has to your organisation.
If you aren’t sure what you’ll need and won’t need in the future, set this to ‘do not automatically expire’, as we do with many client accounts.
UserID (cross-device tracking)
If this is enabled and UserIDs are being collected, your website needs to have a consent mechanism in place and users need to confirm they are happy to be tracked in this manner. If it is enabled and your site does not have a consent mechanism in place to capture user’s consent, we would recommend switching off UserID until that has been sorted.
For UserID to work, you need to have a separate UserID view setup to stitch cross-device sessions together. If that’s not in place but the feature is enabled, chances are it’s not working anyway.
Referral Exclusion List
This should contain your website’s domain, any payment provider domains you are using (such as PayPal or WorldPay) as well as any other domains you own and that may be part of the user’s journey.
When users move between different domains and subdomains, there is a risk of losing the view of the entire journey in your data. You know you have an issue if you find self-referrals in your Referrals report – sessions that originate from your own website. To fix this, you will need to setup cross-domain tracking and ensure the domains are included in this list.
However, do not use this list to try and exclude bot or spam traffic – this traffic won’t be excluded but will instead appear as ‘direct’ traffic in your data, skewing your figures.
Ensure your GA is linked to any other Google products such as Search Console, your Google Ads accounts, or Campaign Manager. Not linking an AdWords account, for example, may lead to campaign data being misattributed and no cost data being pulled through.
A good way to check if unlinked accounts are sending traffic to your site is to check the Acquisition > Google Ads > Campaigns report for campaigns with sessions but no clicks / cost data. If you find a gross mismatch between the two metrics, there is a linking issue. If you find an account with higher clicks than sessions, it is likely that the Ads account is also running campaigns for another website. Best practice is to use one account for one website.
Linking Google Search Console to your GA account populates important reports in the Acquisition > Search Console section of GA. This adds impressions and clicks information which provides click-through rates for your landing pages, a key metric for on-page optimisation. To link Analytics with Search Console, you need to have admin access to both, as well as Tag Manager so you can verify ownership of the site. This can require some tenacity but is well worth doing.
View-level checks – the basics
The first port of call at view level is to ensure that the setup is ‘clean’, which means clearly labelled reporting, backup and test views.
It also means ensuring there is no clutter of filtered views that can be achieved within a GA view, such as channel views like ‘Organic traffic only’ and similar segmentation.
Exceptions to this rule are views created to give third parties access to a limited area of the site and possibly segmentation by geographical markets, depending on your setup.
The first thing we look for in the settings is a clear naming convention to ensure that anyone accessing the data knows exactly what they are looking at and which view they should use for their purpose. For most sites we work with a simple format like this works well:
- [website domain] [view purpose]
- www.freshegg.com [reporting]
- www.freshegg.com [backup / raw]
Website URL – this needs to be accurate as it drives the preview feature within GA. If you are using full URLs in GA, you should leave this field empty.
Time zone – set this to your time zone and ensure it matches time zones of advertising platforms such as AdWords. If it doesn’t, campaign data will be skewed. If you’re in the UK, use the (GMT +1:00) London setting instead of the (no daylight saving) one set by default.
Exclude URL Query Parameters – if your website uses a lot of query strings (long string of characters at the end of the URL that starts with ‘?’), analysis of your content reports can become hard work. This feature allows you to remove query strings from GA and aggregate data for those URLs. To find out which query parameters your website uses, head over to the Behaviour > Site Content > All Pages report and search for ‘?’ to get a list of all query parameters:
You can see in the above screenshot that our website uses ‘page’ and ‘view’, so we simply add these to the field in the settings - job done.
This is a change we only apply to reporting views, not to a raw data / backup view where we like to keep the data unchanged.
Currency displayed as – sets the currency your GA reports in. This should match the settings in your AdWords account to ensure any cost and ROI data is accurate. If your website processes transactions in various currencies, GA will convert these into the currency selected here for reporting purposes, provided your ecommerce tracking setup uses the correct way to mark up the currency of each transaction.
To check if this is working as expected, compare your order revenue for a sample of transactions in different currencies with what’s reported in GA.
Bot filtering – this box should be ticked in all reporting views. It excludes traffic from bots and spiders Google Analytics recognises. It is not a silver bullet for all bots and spam but it’s a good starting point. If you see any other bot traffic coming through, you’ll need to exclude this manually using filters.
Site search – if your website has a site search, we highly recommend tracking the use of it. It can provide valuable insight into what users are looking for but cannot find using the navigation. If this is already enabled, check the site search report under Behaviour > Site Search to see if it’s working as expected. If not, you will need to find the search query parameter your site uses and enter this in the settings. We recommend stripping the query parameter from the URL to clean up the pages reports.
If your site search does not use a search query parameter, this feature will require some advanced setup. Your best bet is probably to track it as an Event, but the exact setup requirements depend on how the search function on your site works.
View filters allow you to manipulate or clean up the data in each of your Google Analytics views. There are some basic filter options, such as ‘Include’ and ‘Exclude’, a ‘Lowercase’ filter option which tackles issues with mixed casing as well as some more advanced filter options.
Warning: filters are applied as the data flows into your view, any data excluded or changed with a filter cannot be recovered or changed back so make sure you test any filters thoroughly in a [Test] view before applying it to your reporting view.
A standard list of filters we would expect to see / apply to a new GA account are:
- Exclude internal traffic - identify all IPs of your internal traffic, third-party agencies working on your behalf etc. and exclude this traffic from your reporting view.
- Include valid hostnames only – include all relevant hostnames, including payment portals and subdomains. This ensures that no unwanted data flows into your GA and keeps out a lot of bots and spam.
- Lowercase filters – set lowercase filters for campaign dimensions, Event category, action and label and, if you have issues with mixed casing in your URLs, also for the request URL.
When creating new filters, you can use the ‘Verify’ option to see what the impact on your data would be. Once created, you can apply a filter to as many views as you like. Just remember: it’s the same filter so if you change it in one view, it will change for all of them!
One of the most under-used features of Google Analytics – annotations allow you to add little comments into the timeline to highlight any chances that may impact the data. Have you made any changes that may affect the data such as added filters or changed Goal settings? Annotate it so that in a year’s time you still remember why the numbers changed.
The same applies to changes to the website, offline marketing campaigns or changes to campaign tagging – in short, anything to do with your website! We try to train our developers, CRO team and marketing teams to annotate all changes in GA themselves. Annotations make developers’ lives easier and can save a lot of time.
You can find and amend Google Analytics’ default channel settings in Channel Settings > Channel Groupings. By default, these settings sort incoming traffic into Direct, Organic Search, Social, Email, Referral, Paid Search and a few others. These groupings work reasonably well but there is a good chance that some of the traffic to your site is not being captured correctly by these rules.
Visit the Acquisition > All Traffic > Channels report and look out for a channel called (Other). In here you find anything that has not been attributed to one of the default channels. Amend the channel rules to ensure all traffic is assigned to a channel but be aware you are changing this for good. A safer way to test how changes impact the channel mix is to set up a custom channel grouping.
More advanced use of this feature is to break channels down more granularly, e.g. Paid Social, Paid Search Brand vs Non-Brand, Email Automated vs Campaign Emails and so on. It really depends on how you define channels in your business.
The custom alerts in Google Analytics are certainly not the strongest feature on the otherwise great platform but they are still worth using. Setting up lots of filters on small changes such as ‘a 20% drop in Goal conversion rate/traffic’ can result in high volumes of alert emails being sent and a ‘Peter and the Wolf’ situation where you stop checking up on every alert because they are often false alarms.
We recommend setting alerts for dramatic changes – 100% change (up or down) in traffic, Goal completions (for each Goal to be alerted if one breaks), transactions as a start and then gradually build them out if you find that you need more. Once a custom alert has been set up, it can easily be applied to several views on the same GA login.
We have looked at the tracking code and the Google Analytics account structure as well as top line property and view settings.
At this stage, we may have found and fixed a few issues but that alone still isn’t enough to give us confidence in the accuracy of the data.
This section is by far the most difficult to ‘template’ as quite often it starts with a gut feeling that something doesn’t add up which then leads to uncovering a problem with the data. However, if you’re spending enough time in Analytics, you should start to sense when things don’t seem quite right.
Here are a few starting points for data investigations we do as part of a GA health check.
Data limits and sampling
In the free version of Google Analytics, which many websites use, you can send up to 10 million hits per month to GA. Beyond that there is no guarantee that your data will be processed, it may even be sampled before processing which dramatically impacts its reliability.
There is no way of telling whether this is happening, so it’s best to try to keep below the limit if you want accurate reporting, finding ways to reduce your hit count if necessary. You can review how many hits your website is sending in the Property Settings.
The amount of data sent to GA on a monthly / daily basis also impacts data sampling experienced within the user interface. As part of a health check, we check where the sampling threshold is and if typical everyday use of GA is likely to encounter sampled data.
If we find this to be the case, we flag it to the client. We can often bring the overall hit count down by making a few small changes.
Channel observations and traffic source attribution
We’ve already looked at channel settings under our view-level checks, so the next step is to take a closer look at how sessions have been sorted into those channels and whether that makes sense. Finding issues here can often unearth bigger issues with the tracking code implementation. A few useful starting points are:
- Check the (Other) channel for misattributed traffic sources
- Review the Referrals channel – self-referrals typically point to issues with the tracking code implementation which can be challenging to identify and fix. Referrals from email servers suggest untagged email campaigns and should appear in the Email channel.
It is also not unusual to find organic search traffic from lesser-known search engines in here. Use the Organic Search Sources feature in Admin > Property Settings to fix this.
Referrals from payment providers showing here? There’s a good chance the converting user journey is broken, double-counting and not correctly attributing the conversion to the channel it originated from.
- Check that campaigns are being tagged effectively - Are consistent naming conventions, spelling and capitalisation being used? Are paid social and organic social clearly distinguishable? Navigate to Acquisition > Campaigns > All Campaigns and apply a secondary dimension of Source/Medium for a good overview of how campaigns have been tagged.
Our campaign tagging tool can help you with the consistency of tagging, search and replace filters can also be very useful for example when addressing three different spellings of ‘LinkedIn’ as a Source.
Bot or Spam traffic
Navigate to Audience > Technology > Network and review the list of Internet Service Providers (ISPs) sending traffic to your site. This is the best place to identify non-human traffic by looking for ISPs with an exceptionally high bounce rate, an average of one page per session, a very high % of new users, and zero conversions. A great little tool to help you with this is the Google Chrome extension, DaVinci.
Check out our blog post How to Deal With Bot Traffic in Your Google Analytics for more detail about tackling bots and spam.
While you are already in Audience > Technology > Network, change the primary dimension to Hostname to see which websites are sending data to your GA. These should be your domain, e.g. www.freshegg.co.uk, or translate.googleusercontent.com and webcache.googleusercontent.com, which show you when someone has used Google’s translate service on your site or when someone has selected the ‘cached’ link on your organic search results.
If more than a few % of total traffic stem from domains that are not your website or googleusercontent domains, you need to get a hostname filter in place because your data is being diluted and all your key conversion and engagement metrics are being skewed.
You can also find out how to do this in our How to Exclude Rogue Traffic From Your Google Analytics blog.
The risk with following templates and check lists like this one is always that something important or obvious that wasn’t on the list is missed. This step is where you need to step back and think about everything you have seen and learned about your Google Analytics implementation. Did it all make sense? Did any numbers stand out to you or seem unrealistic? Be curious: when something doesn’t look right, follow it up.
- Does the bounce rate seem high or suspiciously low?
- What is your rate of new vs. returning users?
- Does the average number of pages per session seem plausible?
- Review the exit pages, are these pages you would expect users to leave the site?
Direct traffic is a frequent source of issues with traffic that should be attributed to other sources often hidden in the Direct category. Look at the landing pages of Direct traffic – these should be top-level pages a user would bookmark or type directly into the browser. If you find many of them to be deep pages with long URLs, chances are something isn’t right.
The health check template provides a few more ideas for what else you could and should be looking at.
Use custom segments to slice and dice the data – are these strange direct visits coming from a specific device category? Here is a blog post we published on dark social traffic which may be useful if the above rings true.
Custom tracking and configurations
We have cast our watchful eye over the default settings and features of Google Analytics, next we take a look at the custom data that is being collected and reported.
Every GA view should have Goals set up to track whether users are fulfilling the purpose of the website. In its most simple form, that is a transaction but there are many other conversion types available such as lead generation, newsletter subscription, account registration, job application, booking a table, logging a complaint, enrolling on a course, or even a phone call.
Are the key conversions on your site being tracked? If you’re not sure, check out our post How to Create a Measurement Plan and Why You Really Need One, which will help you identify your business objectives and KPIs and translate them into data points you can measure in GA with the help of GTM.
If all primary KPIs are being tracked, you need to check if they are tracking correctly. Broken Goals are simple to spot because they have no conversions. We usually approach this by stepping through the user journey and observing what data is being sent to Google Analytics using the developer console and a plugin like DataSlayer or WASP.
You should also compare GA data against alternative data sources, such as CMS data, to ensure the numbers are matching. As a rule of thumb, we consider < 3% variance acceptable.
Pretty much any user interaction on your website can be tracked as an Event, this can either be done with the help of a developer or often directly via Google Tag Manager. The measurement planning blog post mentioned above provides some useful guidance on when to track something as a Goal vs when to use Event tracking.
Verifying the accuracy of Event tracking can be a time-consuming job, especially if a lot of interactions are being tracked. When health checking Event tracking, we typically use the same process described for Goal tracking above:
- Identify what each of the Events mean
- Step through those user journeys to verify whether the numbers can be trusted.
Event tracking can be the source of extensive data collection. If your site is close to breaching Google Analytics’ hit levels, you should consider removing some of the high-cardinality Events. Scroll tracking is one of the usual suspects as it tends to send a large amount of data with every pageview – ask yourself how you are currently using that type of data and if you could live without it before removing an important Event entirely. Usually, there is more than one solution so do get in touch if you get stuck.
Take a look at the full report under Behaviour > Events > Top Events to ensure meaningful naming conventions are being applied for campaign, action and label on each Event. To find out where on your website an event is occurring, you can add a secondary dimension of ‘Page’.
GDPR compliance for Google Analytics
Last but by no means least, while we cannot certify GDPR compliance, we can check a few areas of your implementation we know have been affected by the new GDPR legislation. Here is how you do those checks yourself.
Personally Identifiable Information (PII)
It has always been against Google’s terms of service to send any PII to Google Analytics. This includes, but is not limited to, emails, names, post codes, addresses etc.
PII most frequently appears in the query string of the URL and can therefore be found in the Behaviour > Site Content > All pages report. Start by searching for @ to flush out any emails that might be collected. Use the same principle to find other instances of PII.
Another place worth checking for PII is the Events report. Is it possible that the contact form is recording user inputs in the URL and sending them to GA? This is an important one to get right as Google reserve the right to delete your entire account if they find PII in one of the properties or views.
If you do find instances of PII in your data, don’t panic. First, identify the source of the personal data and stop it form being collected in GA. There is a neat way of doing this via Google Tag Manager if fixing it requires a lot of development or your developers have too much going on. If you’re not confident of doing this yourself, get in touch.
GDPR considers IP addresses to be PII and therefore compliance means making them illegible before sending them on to Google Analytics.
Doing this just requires changing a simple setting in Google Tag Manager (or a single extra line in the standard tracking code) – Google have released a handy guide to IP anonymisation.
Advertising Reporting Features
This is a setting in the GA property settings that collects behavioural user data and shares it with Google’s DoubleClick service. In return, DoubleClick provides demographic data on your visitors. After 15th May 2018 this feature can only be used with the explicit consent of the visitor.
If you have an existing consent management solution on your site, you can hook this feature into that and set it via GTM. If you are not currently asking users for their consent to collect that type of data, you have to switch the feature off. This means you will no longer receive demographic data in Google Analytics.
You can find the setting under Admin > Property > Tracking Info > Data Collection.
The same applies to the more recent feature Google Signals (BETA) which superseded the above settings. If you are uncertain about whether or not your consent management covers you, consult a specialist.
UserID is an advanced GA feature that allows you to track users across multiple visits and devices. It requires users to self-authenticate (login) and needs some development work to set up. Our current interpretation of the GDPR legislation is that this is no longer allowed without explicit consent.
It should be reasonably straightforward to obtain this consent in the login process but if users don’t give explicit consent, you cannot use this feature, and you should disable the data collection of the UserID and delete any UserID data views if necessary.
That’s all, folks
So that’s how we do a GA health check. There’s quite a lot of information to digest, so here’s a quick recap of the steps we’ve gone through:
- Check your tracking code is implemented and tracking correctly
- Conduct property-level checks: property structures, users, data retention and tracking
- Conduct view-level checks: naming conventions, filters, annotations, channel settings
- Check at the data level: sampling, source attribution, bots and spam
- Make sure you’re getting accurate data from Goal and Event tracking
- Check that your setup complies with GDPR requirements
What to do now
Get checking! If you have any questions or want to know more about health checks or any of our services, send us a message or call us on 01903 334602.
Want to learn more about analytics and reporting? Have a look at our blog – you’ll find some great guides to how to use some great GA features as well as all the blogs referenced in this guide.
Download our FREE Google Analytics Health Check Template