How accurate are your web traffic reports?

Recently a client raised concerns that their Webtrends traffic reports were showing traffic well beyond what Google Analytics was reporting. I have good faith in Google Analytics, it would be hard for their reports to be wrong I thought. Fair enough there are cases where the tracking script may not run but that shouldn’t cause the kinds of gaps we were seeing. Webtrends was reporting almost 100% more page views than Google Analytics and roughly 30% more unique visitors.

I had to see for myself which tool was wrong so I took a sample of the server logs from Feb 1st to run a comparison.

More accurate results could be obtained by spreading the analysis over a longer period but this one day had plenty of traffic to work with for now and it would give a good estimate.

I wrote a short ruby script to start going through the logs and counting requests based on very basic criteria.

For February 1st
Webtrends reports 16195 page views / 2030 unique visitors
Google Analytics is reporting 8547 page views / 1415 unique visitors

The server log starts out with 98473 hits

  • Counting every IP address in the log only once yields 2255 unique visitors.
  • Counting only 200 response codes drops that to 1746 visitors.
  • Discounting bots brings that down to 1629 visitors.
  • Counting only requests using GET and POST yields 1589 visitors.

Up to this point we’re still seeing 57133 hits. To get page views and to narrow down unique visitors more I removed any direct request for images, stylesheets, JavaScript includes etc.

This brings us down to a more realistic 9316 page views and 1441 visitors.

Finally I remove hits originating from within the client’s premises.

Traffic for February 1st is derived to be 9185 page views and 1440 unique visitors.

What does this mean?

Compared to the figures derived from my analysis of the server logs for February 1st…

In terms of unique visitors:
Google Analytics has under-reported by 1.7% (-25).
Webtrends has over-reported by nearly 41% (+590)

In terms of page views:
Google Analytics under-reported by almost 6.9% (-638)
Webtrends over-reported by just over 76% (+7010)

Conclusion

Google Analytics appears to be the more accurate and I would be happy to trust it’s figures. A difference of 1.7% on unique visitors is completely tolerable. It’s interesting that page views are off by 6.9% considering it was estimated last year that roughly 6% of browser have JavaScript turned off rendering Google’s tracking ineffective.

Webtrends is vastly over-reporting traffic. It is likely that it is not filtering out invalid traffic correctly. There is a lot of room for improvement.

Recommendations

  1. Your traffic analysing software should be configured to ignore subrequests for images, stylesheets and javascript includes etc. On a wordpress site, those urls might include the following:

    /wp-content, favicon.ico, /wp-includes/, /wp-admin, /wp-login, robots.txt, /xmlrpc.php, /wp-cron.php

  2. Only count requests with a GET or POST method. It should also ignore requests that do not have a response code of 200.
  3. Ignore requests from as many bot user agents as you can. The following is a sample of bots that I found in the logs and subsequently ignored:

    crawler, spider, bot, WordPress, XML-RPC, Java, Python, Feedfetcher, Feedreader, Feedburner, Bloglines, Netvibes, RSSOwL, NewsGator, Veoh, wget, Mediapartners-Google

  4. Alternatively you may try to set which user agents to accept instead of block. I used the following keywords to target the vast majority of valid user agents for my tests, it may need to be added to in order to include some marginal browsers:

    mozilla, MSIE, slimbrowser, opera, blackberry, aol, clearware
  5. Google Analytics may be made more accurate by upgrading to Google’s new tracking script (ga.js) and placing the tracking script just after the BODY tag is opened. Hopefull this will catch more pageviews as it is likely that some are missed when users imediately click on a link on the homepage before it has loaded the tracking script.

    This will also enable us to add tracking for file downloads, which could not be done with Google’s previous tracker (urchin.js).

Results

After applying these conditions to Webtrends it now reports 9526 page views / 1274 unique visitors for Feb 1st.
That puts page views within 3.7% of the figure I arrived at.
Unique visitors seem to have dropped off for some reason and are under-reported by 11.5%

Finally

I think it’s safe to say that traffic analysis is not an exact science. Everyone seems to have their own take on what traffic should be counted. Don’t just rely on one method of tracking as you may end up under-reporting or over-reporting your traffic. Either is bad for business. For the meantime I still have faith in Google Analytics.