Why Doesn’t Big Data Always = Good Data?

The Data Scientists out there will sigh as they feel that they have heard this a thousand times before. However, it is human beings that are the issue. Numbers are just numbers, it is what we humans do with them that is the issue.

Very quickly then; this is the correlation and causation argument writ large.

correlation causation.jpg
But it must be true???

Can you see the issue? On the face of it it makes sense. I prefer the elegance of expression of the original description of post hoc ergo propter hoc. Merely acquiring more and more data points, a bigger data set, better hardware, software and human expertise to manipulate this data does not equal better results from the data.

Big data is great and powerful when it is clean and accurate data. But….pause and think: before plunging into the analysis and insight phase the cleaning and tidying phase – the often skipped past boring stuff – needs to be complete. The crazy outliers need to be identified, partial data from a one source needs to be investigated, in the case of human surveys the ‘don’t know’ answers may be coded out, and so on.

There are a variety of ways to allow the Data Scientists to do this, but the heart of the matter is that if they are not given the time, tools and budget to do this then you are back to the junk in, junk out scenario that affects everything to do with computers.

As humans we are programmed in ways that really hamper us. This is especially true when we are operating outside of our field of expertise or are very out of date regarding a subject matter area. Our brains crave clarity and simplicity, we avoid the unknown as that is where danger may lie. We want to make as smooth and as risk-free transit through life. Because of this the best and the brightest can suddenly become very credulous and succumb to deep-seated fear and prejudice. This propensity feeds the behaviour of some because they are told something, seize upon it and then happily transmit it to others as fact. The recipients believe it, often more so when it is passed to them by a person or source in whom additional credibility is invested.

I was struck yesterday when listening to an episode of The Infinite Monkey Cage – a science program on BBC Radio 4 – where anthropologists and evolutionary biologists were tearing their hair out at the traction an image we are all familiar with has gained. The evolution of man from ape to upright walking man is apparently a terribly inaccurate and misleading image. Apparently, it first appeared in a French school textbook back in the Fifties, resonated (which shows the power of a credible source and a good image) so much that it stuck and has been reproduced millions of times over. I had no idea how inaccurate it was and like to think that I am not very credulous. It goes to show the power of something that has been ever-present though. Few people except the experts challenge it, even now.

human-evolution-670
The iconic, contested and wholly inaccurate image

Bringing this to business: I feel for the person or team at Apple that had to brief Tim Cook and co that the earnings forecast had to be dramatically trimmed because the previous cash-cow of the iPhone was no longer selling as quickly. I appreciate I have the benefit of hindsight regarding the following remark; the fact that people were hanging onto their devices for longer and were railing against the so-called planned obsolescence that many believed was being built in.,coupled to the belief that the latest OS was designed to overwhelm older devices and yet without the latest OS then the functionality was going to limited henceforth, really upsets consumers. If that is combined with the increase in length of the service contracts we are all but forced to agree to by the network providers (here in the UK at any rate) in order to have the latest tech, subsidised by these growing contracts, I suspect this wouldn’t be such new news.

We can see the clever PR operation swing into action. Apparently great PR relies so heavily on gut feelings and relationships that people overlook how incredible people are at computing very complex Big Data. Still far ahead of any computer. To whit: the entire slowdown has been pinned almost completely on the Chinese market. Something I find hard to swallow. I have no doubt it is a large component and very politically expedient given the way China is portrayed in the US these days. The messaging seems to play heavily on the deterioration of relations between the US and China. The PR teams are operating on very thick and contextual data, nothing more. The human brains are the computers here. Either way, is apparently, not the fault of Apple… *coughs politely*

blaming everyone else

On the other hand, perhaps they knew of this trend and the feelings that underpinned it because they had excellent Big Data, had combined it with the Thick Data approach and insights of Anthropologists, Sociologists and Political Scientists who specialise in these fields, so they could synthesise the findings into usable data, and the real issue wasn’t knowing this but when to let the markets know? Sadly, few large companies manage to meld their data very effectively and usually the larger they are the greater the disconnect between the boardroom and the customer, and the inadequacies of the information providers aren’t spotted soon enough.

What about the person responsible, or is there one? Challenging assumptions is often uncomfortable and often seen in an organisation as disruptive and potentially unwanted behaviour. A Chief Data Officer (CDO) ought to have both the support and power to ask the ‘who, what, when, where and why’ questions relentlessly. In fact, if they aren’t querying the data they are to use for gaining insight and helping the other leaders to make the best informed decisions, they are probably falling short in their role.

Advertisements

Why is data dangerous?

In the words of @RorySutherland: “The data made me do it” is the 21st Century equivalent of “I was only obeying orders”. The growing power and influence of Data Science touches everyone’s lives. Sutherland also remarks: “Markets are complex and there can be more than one right answer. People in business prefer the pretence of ‘definitive’ because if you can show you’ve done the ‘only right thing’ you have covered yourself in event of failure”. These are all attempts at Plausible Deniability, and they are weak.

For the record, plain old data is not dangerous, you are unlikely to be hit by an errant Spearmans Rho, or a rogue Control variable that detached itself from an analysis. Data is just a record of the measurable values of something that has happened in the past. Digital exhaust, if you will. Like speed in a car, it is the inappropriate use of it that causes issues.

zuck-data

Doing the right thing often sees people becoming  enslaved to Type 1 and Type 2 data, because they are the easy parts. You can hire experts, who can count well, use the software and understand how to tease out knowledge from the data points. What the majority can’t do, or may even do intentionally, is to manipulate the presentation, context and language used when presenting their findings. This is the Type 3 data I talk about, that isn’t traditional data as we know it.

Type 3 data is the really dangerous stuff. The reason for this is our complete fallibility as human beings. This is nothing to be ashamed of, it is how we are made and conditioned. It is in fact, entirely, boringly, and ordinarily normal. I was recently told by a lawyer – I say this because she is pretty well-educated – that all statistics are a lie. She then cited the famous Mark Twain (nicked from Disraeli) saying of, “There are lies, damn lies and statistics”, as if this were all the proof she required. Interestingly, when I challenged her on this and made a case for accurate uses of statistics she refused to even acknowledge this. She was wedded to her belief and I must be wrong. Case closed.

statwordcloud

I think immersion in courtroom rhetoric may have been getting the better of her. However, this goes to show the just how dangerous we humans can be. Imagine being a client with a lawyer whose dogmatism may cause them to overlook or be able to question relevant statistical evidence? All stemming from a strongly held view that all statistics are lies. Professor Bobby Duffy recently wrote an excellent book called Perils of Perception and on p.100 he shows just how problematic this view can be.

My point is: If a person who is well-educated, and practising in a profession like law, can hold such a position, then it is not beyond any of us to do so, quite unwittingly. Until one is more familiar with the behavioural biases that we are all susceptible to, the way Type 1 and Type 2 data can be mis-represented (Type 3 data) and how that uses our in-built foibles to generate a reaction.

This is where someone who understands both of these areas, and can blend that knowledge into an expertise which is useful, can help you. When important decisions on strategy, direction and spending  are conditional on interpreting data from others, you want to get it right first time. If not, you’ll be forced into, “The data made me do it”, and that rarely ends well.

burning money