How accurate are your web traffic reports?
February 13th, 2008 | Design & Development, Technology |Recently a client raised concerns that their Webtrends traffic reports were showing traffic well beyond what Google Analytics was reporting. I have good faith in Google Analytics, it would be hard for their reports to be wrong I thought. Fair enough there are cases where the tracking script may not run but that shouldn’t cause the kinds of gaps we were seeing. Webtrends was reporting almost 100% more page views than Google Analytics and roughly 30% more unique visitors.
I had to see for myself which tool was wrong so I took a sample of the server logs from Feb 1st to run a comparison.
More accurate results could be obtained by spreading the analysis over a longer period but this one day had plenty of traffic to work with for now and it would give a good estimate.
I wrote a short ruby script to start going through the logs and counting requests based on very basic criteria.
For February 1st
Webtrends reports 16195 page views / 2030 unique visitors
Google Analytics is reporting 8547 page views / 1415 unique visitors
The server log starts out with 98473 hits
- Counting every IP address in the log only once yields 2255 unique visitors.
- Counting only 200 response codes drops that to 1746 visitors.
- Discounting bots brings that down to 1629 visitors.
- Counting only requests using GET and POST yields 1589 visitors.
Up to this point we’re still seeing 57133 hits. To get page views and to narrow down unique visitors more I removed any direct request for images, stylesheets, JavaScript includes etc.
This brings us down to a more realistic 9316 page views and 1441 visitors.
Finally I remove hits originating from within the client’s premises.
Traffic for February 1st is derived to be 9185 page views and 1440 unique visitors.
What does this mean?
Compared to the figures derived from my analysis of the server logs for February 1st…
In terms of unique visitors:
Google Analytics has under-reported by 1.7% (-25).
Webtrends has over-reported by nearly 41% (+590)
In terms of page views:
Google Analytics under-reported by almost 6.9% (-638)
Webtrends over-reported by just over 76% (+7010)
Conclusion
Google Analytics appears to be the more accurate and I would be happy to trust it’s figures. A difference of 1.7% on unique visitors is completely tolerable. It’s interesting that page views are off by 6.9% considering it was estimated last year that roughly 6% of browser have JavaScript turned off rendering Google’s tracking ineffective.
Webtrends is vastly over-reporting traffic. It is likely that it is not filtering out invalid traffic correctly. There is a lot of room for improvement.
Recommendations
- Your traffic analysing software should be configured to ignore subrequests for images, stylesheets and javascript includes etc. On a wordpress site, those urls might include the following:
/wp-content, favicon.ico, /wp-includes/, /wp-admin, /wp-login, robots.txt, /xmlrpc.php, /wp-cron.php - Only count requests with a GET or POST method. It should also ignore requests that do not have a response code of 200.
- Ignore requests from as many bot user agents as you can. The following is a sample of bots that I found in the logs and subsequently ignored:
crawler, spider, bot, Wordpress, XML-RPC, Java, Python, Feedfetcher, Feedreader, Feedburner, Bloglines, Netvibes, RSSOwL, NewsGator, Veoh, wget, Mediapartners-Google - Alternatively you may try to set which user agents to accept instead of block. I used the following keywords to target the vast majority of valid user agents for my tests, it may need to be added to in order to include some marginal browsers:
mozilla, MSIE, slimbrowser, opera, blackberry, aol, clearware - Google Analytics may be made more accurate by upgrading to Google’s new tracking script (ga.js) and placing the tracking script just after the BODY tag is opened. Hopefull this will catch more pageviews as it is likely that some are missed when users imediately click on a link on the homepage before it has loaded the tracking script.
This will also enable us to add tracking for file downloads, which could not be done with Google’s previous tracker (urchin.js).
Results
After applying these conditions to Webtrends it now reports 9526 page views / 1274 unique visitors for Feb 1st.
That puts page views within 3.7% of the figure I arrived at.
Unique visitors seem to have dropped off for some reason and are under-reported by 11.5%
Finally
I think it’s safe to say that traffic analysis is not an exact science. Everyone seems to have their own take on what traffic should be counted. Don’t just rely on one method of tracking as you may end up under-reporting or over-reporting your traffic. Either is bad for business. For the meantime I still have faith in Google Analytics.










February 13th, 2008 at 4:55 pm
Interesting study, but I’d like to point out a few things and best practices for web analytics.
When using products like WebTrends that analyze web server log files, you need to use filters to remove all non-human traffic. That includes search engine bots and various up-time utilities. Java script based tracking like Google Analytics or Web Trends SDC don’t capture nor report on this traffic. (Note: analysis of bot traffic is also important from an SEO perspective).
If you’re site has various downloads like PDFs, Word documents, Excel documents, video etc. that can be accessed directly (e.g. not via a registration form) search engines will index these items and they may show up in search results. Also, site visitors might be e-mail links to these items meaning users can go directly to them without generating a page view. Traditional log files will capture these important downloads while java script based tracking won’t. This can explain some of the reasons for the higher visitor counts in WebTrends.
One report I like to look at in WebTrends to see if this happening is the Page Views per visit report. Look for some visits with ZERO page views. (are people using your images on their site?)
By the way, WebTrends does not count calls to css, js, jpg, gif, ico, png etc. files as page views by default. Check your list of file extensions that have been defined as page views to make sure.
As to visit and visitor counts. Google Analytics uses a cookie to determine and track users. Unless you’ve programmed your site to set and record (in the log file) cookies and configured WebTrends to use this cookie for visitor and visit tracking, it will default to using a combination of IP address, OS and browser to determine unique visitors. (WebTrends does offer a utility for integrating persistent trackable cookies into your site). This can cause weird data to appear especially if you site is visited by lots of AOL users or large organizations that use a single IP address (proxy server) and have standardized their users systems.
I’ve done various comparisons between these two products and have generally found that Google Analytics is missing about 10-15% of all traffic. There are various reasons for this beyond the java script issue already discussed in the post. Recently when comparing organic search engine based traffic (must be human since they searched on a term), Google was missing almost 2/3 of the searches for a given client.
I’m not here to slam Google Analytics, it is a good product and it is free. It has several features that I find extremely helpful that WebTrends doesn’t provide. Remember, that best practices of web analytics states not to look at the absolute numbers but to look at the trends. When doing this, you’ll find both products are equally accurate.
Note: I do not work for WebTrends or any other web analytics company. My firm does work with a variety of web analytic tools on behalf of our clients including WebTrends and Google Analytics.
February 13th, 2008 at 5:12 pm
Johnny,
I’m puzzled by your results, as we have significant processes to ensure accuracy of our tracking. As it is unclear from the above whether you use WebTrends software or our On Demand service, I’d like to have a chance to review the details, and hopefully learn from it.
Please fell free to contact me.
Sincerely,
Xavier Le Hericy
February 13th, 2008 at 5:38 pm
@Xavier
The client is using the Webtrends software I believe. I’m not sure which version. I had simply been given some reports churned out by the software.
@Alan
Thanks for that. It would seem to me that the client was not aware to what extent filters needed to be created in order to get an accurate representation of their traffic from their logs.
For the purpose of this test I ignored downloadable pdfs etc as tracking via Google Analytics was not in place for those.
On the whole I expected to see a larger difference between the results I obtained and the results from Google. I had read about and expected, as you said, a difference of 10%-15%.
I take your point on watching the trends rather than the figures themselves as they should be common in both.