How to Exclude Rogue Traffic From Your Google Analytics – a Step-by-Step Guide

Written by Intern - 16 Feb 2015

If you have a problem with rogue traffic being sent to your Google Analytics (GA) property, then a hostname inclusion filter is precisely what you need. We refer to ‘rogue’ traffic when talking about traffic that should not be included in your reporting stats because it is not real traffic to your websites. Rogue traffic problems may be caused by:

  • Test server traffic sending data to the same GA property
  • Accidentally using the same tracking code on a different website you own
  • Someone maliciously hijacking your property ID and sending false data from unrelated websites

This type of traffic can be excluded from your data with a simple filter, but rather than using an exclude filter to eliminate unwanted traffic sources, we propose setting up an include filter that only includes traffic from the hostnames you have defined as being legitimate.

The aim of an include hostname filter is to ensure that the reports contain data from the correct hostname only. This is more efficient and future-proof than trying to exclude any incorrect sources as they appear.

What is a hostname?

The hostname is the domain your website is running on, highlighted below:

www.example.com/shopping

http://abc.example.co.uk/test

What do you need before setting up a hostname filter?

  • Confirm the website domains with your developer
  • A test view to monitor the effects of the filter, before transferring to your main reporting view
  • Knowledge of simple regular expression writing or someone who can help you write a regular expression

Top tip: Ensure you have a raw backup view in place that captures all unfiltered data before applying any exclusion filters to your main reporting view.

Do my GA reports contain data from unwanted sources?

The quickest way to find out whether or not your GA reports are contaminated with data from test servers and unrelated websites is through the hostname report which is located here: Audience -> Technology -> Network -> Select the primary dimension of ‘Hostname’.

 ga_hostname_inclusion_filter (2)

If you haven’t applied a hostname filter yet, you’re likely to see traffic from development servers or completely separate websites.

In our screenshot above we have rogue traffic coming from:

  • Test servers (freshegg-uk.freshegg.lom)
  • Traffic from our Australian website which we don’t want to include in this view (www.fresheggdigital.com.au)
  • Traffic from a completely unrelated site (www.flowersbylouise.net)

Traffic from googleusercontent

It is possible that you might receive traffic from translate.googleusercontent.com and webcache.googleusercontent.com. These sources should be included, they show when someone uses google translate service on your site and when someone selects the ‘cached’ option on your organic search result. If you see similar services from other search engines in your hostname report these should be included too, for example yandex.translate.

Creating the regular expression pattern

  • Use an expression tester while creating your expression
  • Use a syntax cheatsheet
  • Remember – to escape any metacharacters put a back slash (\) in front of a full stop (.) or a forward slash (/)
  • Use a vertical bar (|) to separate the hostnames you want to include
  • You don’t have to include the start of line (^) or end of line ($) anchors in GA

For example:

We can identify freshegg.co.uk and freshegg.com as valid traffic sources, as well as traffic from googleusercontent sources.

Our regular expression would look like this: freshegg\.(co\.uk|com)|googleusercontent

  • We’ve escaped any dot characters with a backslash
  • We’ve separated the googleusercontent case with a | operator
  • We’ve created an option for .co.uk or .com with parenthesis, (), to contain the options

Have a go at creating yours in the expression tester . Ensure all your domains match your expression before setting up the filter.

You can test this filter by pasting it into the filter box in your hostnames report, the report should now include all the hostnames you want to include.

 ga_fresh_egg_expresion_tester

To double check that no valid hostnames have been excluded by this filter, click on ‘advanced’ and set your filter pattern as an ‘exclude’ filter. If the report still features a hostname you did not mean to exclude, you will need to review your filter pattern and test again.

ga_hostname_inclusion_filter

Top Tip: Test your filter thoroughly; data excluded from your reporting view with a hostname filter cannot be recovered at a later date. And only apply it if you have a raw backup view setup.

Setting up the filter in your test view

Naming convention

The hostname filter is specific for each website you are tracking. For this reason we recommend the following naming convention to ensure the filter is not carried across to incorrect properties:

Include Only Valid Hostnames – [Your Website Name]

Filter settings

Filter type: Custom include

Filter field: Hostname

Filter pattern: Your hostname regular expression

ga_inclusion_filter_settings

Testing

Test your hostname filter thoroughly before applying it, follow these steps before applying the filter to your main reporting view:

  • Check real-time reports to spot any immediate issues
  • Wait two to five days depending on your traffic volume
  • Compare the test view data with your main reporting view

Top Tip: Never apply any filters to a reporting view on a Friday just as you would never deploy website changes on a Friday – simply because nobody will be there to deal with the consequences on Saturday if something went wrong.

What should be tested?

Once you have tested your regex and applied it to the test view, collect at least a few days’ worth of data. If your site is a low traffic site, do a full seven-day cycle to ensure you have enough data to compare. Then, compare all the main metrics and reports between your test view and your main reporting view.

You should:

  • Compare the entries in the hostname report to ensure a valid hostname has not accidentally been excluded
  • In the ‘Audience Overview’ compare the ‘Session’, ‘Users’ and ‘Pageviews’ metrics, look out for any large differences
  • Check your conversion reports – you may have accidentally excluded a payment provider or quotation engine
  • Check your ecommerce tracking

Apply to the main reporting view

Once you have completed testing and confirmed your filter is working, transfer it to the main reporting view. And don’t forget to record this change so any changes in the data from that date can easily be traced back to the filter you applied.

Do you trust your Google Analytics data? Can you get all the information from GA that you need to make important decisions on advertising, conversion optimisation and content creation?

Get in touch to talk to us about the challenges you face and how we can help you to get the most out of Google Analytics.