How Do Discrepancies Happen in Digital Stats? [Infographic]


Anyone who works in digital marketing will have come up against the same simple and annoying problem – finding two sets of stats that don’t quite line up.

We had a great week! Look at my chart!” you’ll say.

“My spreadsheet says it was a terrible week, you’re fired” they’ll say. Sigh.

While discrepancies can sometimes be shrugged off, they can also be a real heartache if your boss or an external company is telling you that their reality is in fact somewhat different to the reality that you are observing (and that, of course, their reality is more important).

There is always the chance that they are lying or just wrong (more on that later). There is also a (terrifying) chance that there might be something fundamentally wrong with your tracking – and you don’t know when it started or how to fix it.

The truth, however, is almost always something a lot more mundane. Different stats don’t line up because that is how the internet is built. It’s not just that they *don’t* line up, in most cases, it’s that they actually can’t be identical.

While there are many reasons why stats go awry, there are three main reasons it happens which I’ll go through below. These should put your mind at ease, but just in case it doesn’t, I’ll end with a nice soothing infographic too.

 

How Do Discrepancies Happen In Digital Stats

 

Reason 1 – Recording Stats in Different Ways

As we all know, the internet is a lie. All those lovely cat pictures are secretly made up of letters and numbers – of code – HTML, CSS, Javascript, whatever. And that code (like all text) needs to be read in a specific order – from top to bottom. This leads us to the simplest reason why discrepancies can happen on the internet – the order of the code.

When I worked on a site (a long time ago to be fair) I remember being very frustrated that Google Analytics and Google AdSense would record different amounts of page views. They are made by the same company for goodness sake! On top of that my WordPress Jetpack stats were always a bit different too! I didn’t know who to believe, or which report to trust.

So I made a chart tracking the discrepancy over time, and while it wasn’t consistent every day the three lines did stay near enough to each other and (almost) always in the order Google AdSense > WordPress > Google Analytics.

 

This was the first version of the chart. Unfortunately the daily, weekly, and monthly versions I obsessively made seem to be lost to the ages.

 

After much rending of garments, and screaming at the Gods, I eventually worked out why. The AdSense code was in the header of every page, while the Google Analytics code was in the footer. I don’t know where the Jetpack code was, but as it was coming from a plugin I assumed it loaded after the page started loading.

While explaining how these difference in placement will affect the stats to my boss I realised THIS MUST HAPPEN ALL THE TIME. For a million reasons, webpages sometimes don’t load entirely correctly. Sometimes people leave a page before it fully loads, sometimes a glitch appears, sometimes people click a link or refresh the page too quickly. Depending on when this happens changes how much of the code is loaded.

Some code goes in the header, some goes in plugins, some goes on a page, some goes in the footer, and each are get cut off by some amount of users. While it doesn’t make a difference for one user, over time the discrepancy grows!

More than this – if the stats are physically being recorded on different servers, even more of a lag can develop. This makes it virtually impossible (or at least extremely unlikely) that two platforms will ever record the exact same stats.

 

 

What Can You Do About It?

Like with all discrepancy issues, the solution is more about accepting it into your heart than fixing it.

For this one – choose a single platform to be your “point of truth” and stick to it. Every analytics program records things slightly differently anyway, so consistency is the thing to strive for. You generally don’t need to know exact numbers with web stats, but you do need to be able to spot trends, and consistency in reporting is the key to that.

Also, track all the platforms you use for reporting against each other over time (especially if they are reporting on the same metrics). You should expect the difference between them to stay roughly stable, so if one of your platforms dramatically changes then there might be something wrong with it.

 

Reason 2 – Recording Stats for Different Purposes

Back when I worked in AdOps it was in our terms and conditions that we would charge advertisers for the results of their advertising based on our own tracking. That wouldn’t stop advertisers trying to get out of paying because the results they recorded were much worse than how we thought we did. I assumed shenanigans to start with, but the answer is much simpler than that.

If you are a running a CPA campaign then conversions can be recorded in many different ways. Google Analytics by default attributes all conversions to the last thing that was clicked on. Facebook Ads still takes responsibility for a conversion 7 days after a click on an ad happens, or 1 day after an ad is seen.

Yes you read that last bit right – not only are there post-click conversion windows, there are also post-impression conversion windows. If you think about it – it’s actually fair enough. If someone clicks on an ad, doesn’t buy immediately, but then goes back to the website in the next few days and buys something – didn’t the ad cause that sale? Similarly, if someone sees an ad then later makes a purchase, wouldn’t it be reasonable to assume that the ad had an effect?

 

 
Facebook is not alone in attributing conversions this way, and like everyone they make the #37excludeGlossary window large, and the #38excludeGlossary conversion window small as that seems fairest to everyone.

How does this affect discrepancies in stats then? Well imagine that you run ads on both Facebook and Twitter at the same time and someone clicks your ad on Twitter (but doesn’t purchase), then immediately sees your ad on Facebook (but doesn’t purchase), and then immediately gets an email and clicks the link and makes a purchase (immediately). Facebook, Twitter, and your Email Platform will all claim that conversion as their own.

This can be a real problem if you are using their analytics to report on. Adding it all up you will see that 3 conversions happened today, when in reality there was only one sale.

This was what was happening back when I was in AdOps. Like Facebook Ads, Twitter Ads, Email platforms etc, we were claiming full responsibility for every conversion that we contributed to. The problem was, so was every other ad network being advertised on, leaving advertisers to have to #39excludeGlossary results and patiently explain to us all why we weren’t going to be paid nearly as much as we thought we were going to be.

[Fun Fact – Twitter and Facebook et al don’t care about your #40excludeGlossary. If you advertise with them you still have to pay up for every conversion they think they caused even if they are mistaken. To be fair though – their CPA campaigns are far cheaper to run so you’ll still come out ahead in most cases].

What can you do about it?

Google Analytics only cares about the last click by default, and so in the above example, it would attribute the conversion to email. This is at least a little unfair to the ad platforms which assisted along the way, so Google Analytics has a report called #41excludeGlossary to show you what other platforms helped with conversions (it only counts clicks, however).

You can also test different attribution models within GA by using their model comparison tool, and try to work out something which is fairest… however, I personally wouldn’t bother.

 

Assisted Conversions and Model Comparison Tool in Google Analytics

Find these in the #56excludeGlossary side menu of Google Analytics

 

This is because no matter what you do you won’t be able to add in #57excludeGlossary conversions from any channel. Even tools which allow you to add in some #58excludeGlossary conversions, won’t be able to add them all in (I’m looking at you Google Campaign Manager). This means you simply cannot get analytics which are entirely ‘fair’ to every platform. Instead you should just accept that the analytics for these platforms are useful for different things.

On any ad platform, it can only see conversions that it has a hand in – so use those analytics to improve performance on that platform. You might not agree with the total number of conversions it claims, but you will still be maximising your overall conversions if you listen to these reports.

For your own reporting, you need something that takes into account conversions from every source – so Google Analytics is your best bet. While last click isn’t perfect, applying a single model across all platforms is the closest you will get to attributing in a reasonable manner.

[Also FYI – 3rd party cookies are disappearing, so it’s not worth worrying about #60excludeGlossary conversions anymore as soon they won’t be technically feasible.]

 

 

Reason 3 – Human Error

Check the timeframe of the report you are running. I can’t say this enough. CHECK THE TIMEFRAME.

Comparing two reports which use different timeframes is a problem which comes up far more than any other. When someone says “these stats are different”, your first reply should always be “what date range did you use?”. You will save yourself so much time this way.

So please, when checking stats, check the timeframe is the same for both reports first. If you are telling someone else how to check stats, make sure you show them how to change the date range before you do anything else, and again as the final thing you tell them. It comes up so often it is unbelievable.

Unfortunately, this timeframe issue can be compounded when a platform uses “local time” and the two sets of stats are from different timezones. There is an annoying amount of software which still defaults to an American timezone (EST), meaning that even if you looked at stats from the same date range, you would get something slightly different. The only way to identify this issue (if you don’t have access to the settings of both platforms) is to run reports broken down by hour and see if they are simply time-shifted.

Of course, date range isn’t the only type of human error available to us mere mortals. Mistakes of all types still abound. The most common ones to check first for are that results are somehow being filtered, or have been edited in some way. If results are from some sort of manual calculation there could be an error the formula, meaning you should double-check the raw statistics.

What Can You Do About It?

Mostly just be understanding. Everyone makes mistakes sometimes.

 

 

Good Old Grimey, whatever happened to him?

 

It’s much easier to find mistakes if you get data broken up in many different ways to see where the issue lies are happening.

Breakdown the stats into smaller chunks, and then turn them into graphs so you can visualize them. Going down to day by day or even hour by hour and making a timeline chart can show you if different time zones are in place (or find exactly when a discrepancy started). Getting a report broken up by URL or country, or date can often let you see where the problem is happening.

Whatever you do, don’t assume someone is stupid, or that everything is their fault. It could you be you who is making the mistake after all.

 

Infographic

How Digital Stat Discrepancies HappenClick to enlarge