Crap numbers

I wrote in my last post in this blog about meaningless and misleading infographics where visual presentation of data is used but the conclusions drawn from these presentations are either misleading or vacuous.
However, these infographics are simply a manifestation of a wider phenomenon – that of crap numbers.

Crap numbers are numbers that are used by marketing people, politicians and journalists to try and make a point but which when you look at them more closely, don’t mean a damn thing.

This was brought home to me today when I attended a presentation on Amazon Cloud Services and the presenter produced a slide saying that the durability of Amazon’s S3 service was 99.999999999. When I asked what this meant, he said that if I had 10, 000 stored objects, I would lose one of them every 10, 000, 000 years.

I then asked if Amazon had actually tested this and, of course, they hadn’t because they haven’t been around for 10 years, let alone 10 million. He then admitted it was a theoretical figure. The problem is that reliability theory doesn’t work with ‘rare events’ – things that hardly ever happen. There are not enough of these events to come to any statistical conclusions about them and we have no idea of how failures in different parts of a system may be related when these events occur.

Amazon S3 storage is pretty good and you are very unlikely to lose anything in S3 – but neither Amazon nor anyone else can put a trustworthy number on this.

The recent Scottish referendum was another wonderful source of crap numbers. I’ll give a couple of examples:

1.     There were different estimates given of the amount of oil remaining in North Sea reserves from 8 billion barrels to 24 billion barrels. Politicians used these figures as if the North Sea was like a whisky bottle with some whisky left in it. The implication was that there is a fixed amount there and that when there was disagreement about the amount one side or the other was lying about it.

In fact, all of these number can be defended. The question is not ‘how much oil is left’ but ‘how much oil can be economically extracted’. This depends on both the world oil price, which is unpredictable and the extraction technology, which tends to improve and fall in price over time. But, nobody can be sure what will happen and it’s disgraceful that politicians did not admit this.

2.     After the referendum, there was an opinion poll taken which asked a sample of the population how they voted. This showed that 71% of young voters, voted ‘yes’ and 29% voted no. Politicans and commentators latched onto this figure and some ‘yes’ supporters disgracefully condemned older people for being too conservative and sabotaging Scottish independence.

However, when you look into the opinion poll in more detail, you find that there were 14 young voters in the poll. This means that 10 said they voted ‘yes’ and 4 voted ‘no’. But we have no idea if those sampled told the truth about how they voted and the chances of a sample of 14 being representative across an electorate of hundreds of thousands is pretty small.  So, in truth, we really have no idea how votes were distributed across age bands (another poll, equally unreliable, came to a different conclusion).

To be fair, these marketing people, journalists and politicians do not always deliberately misuse numbers – their maths and statistics is usually pretty minimal and they don’t understand the limitation of data. But you should always be sceptical when people trying to convince you of something use numbers to make their case – more often than not, these are crap numbers.

