Why is Data So Unreliable?

This post is about the information that is used by a supercomputer we all carry around with us (no, dummy, not your mobile phone. Not quite yet). It is about the idea that we all suffer from innumerable distortions and agendas. Even those who claim to have no agenda…well, that is the agenda. To pretend they are above it or haven’t got one is disingenuous and indicates a worrying lack of self-awareness.

This supercomputer is an organic machine, part of a larger organism with a wide range of sensors for input, it requires very careful maintenance, is easily damaged and easily corrupted, benefits from being made to regularly solve complex problems, has seemingly limitless storage capacity and is carried around on your shoulders.

In our rush to worship at the altar of AI, data, quantum computers, and machine learning we step neatly over the weirdness and utter unpredictability that this supercomputer (for ease of typing it shall henceforth be known as the brain) brings.

optical illusion
Two straight lines, bent by the brain.

Sure, you can reduce all the external inputs into a measurable thing. The number of photons that hit the retina, the loss caused by an aged lens, the sensitivity of our fingers, sense of balance, the speed at which we solve problems and so on. Despite possessing a brain in all its judgemental and unpredictable glory we seem desperate to quantify and measure everything possible. For with the surety that comes from turning every conceivable bit of input into a number, then surely it must be within our ken to calculate the output? And if we don’t get it right then we can re-examine the computational processing algorithms and refine them until we come to as close an output that the programmer(s) expect. You see, there will be some parameters set for an acceptable/realistic/likely outcome somewhere in the brief, and usually, the aim is that the output matches the expectation.

The brain starts to gallop ahead, and I suppose this is what intrigues the scientists trying to create AI when the capacity for the random social variables and the filters they create comes in. The ability of the brain to make the weirdest associations from two apparently random bits of data always astonishes me.

For example: 30 years ago I shared a flat with Oliver Reed’s nephew, who apparently strove to exceed or at least replicate the lifestyle of his uncle. I was persuaded to take a tab of acid (LSD), which I tore in half because it scared me but, hey, peer pressure. This had a startling effect on me and I had to sit alone as it felt as if snooker balls of thoughts were cannoning into one another and going off at funny angles. To this day, if I see a snooker game I am reminded of that moment. I imagine it will be a very very long time before there will be a computer that can make those sort of weird cognitive leaps.

Data can be as much social and experiential as pure numbers fed into STATA, R or SPSS and manipulated in various approved ways. The data coming from the brain’s sensors is reliably distorted through one or several social lenses. Are you rich, poor, foreign, insecure, angry, a victim of something, in a wheelchair, aspirational, impaired, with a neurological condition (I have MS – have had for 26y)? Perhaps you are clinging to the notion that you are utterly impartial and free from an agenda and thus right? That is a powerful filter, often producing feelings of self-righteous indignation that can’t always be adequately expressed in 280 characters.

trump-twitter-03
It doesn’t stop this fellow from trying though.

When you are designing research, analysing research, presenting research, having research presented to you then try to remember that your brain and the recipient’s brain have different filters. They may seem externally similar, but at some point, that same information will hit a unique filter and the gap between intent and understanding soon becomes apparent.

The human desire to avoid cognitive dissonance is strong and mismanaged leads people to do terrible things to try and ‘fix’ it. Thankfully, we have a great ability to bend data to fit our pre-conceived notions of what feels right or to fill voids with made-up data. We ALL do it. I believe that the most we can do is to open up the analysis to others who manifestly do not share a similar agenda. As independent as possible and are trained to look for inconsistency, in the accrual of data or the motives of those who have handled it before you see/hear it.

So-called facts in newspapers are a good place to start asking why and how. In a commercial environment, people bandy around poor data and try to cover it with the force of personality of seniority. BBC Radio 4 has an excellent series (available for download) called Thought Cages that deals with the vagaries of the human brain in an amusing and engaging way.

We all lie and deceive all the time. It is in our nature. Sometimes you need a different variety of deceiver to look into your world and help you identify the deceits. I can help you, so contact me via LinkedIn.

 

How Do I Know…

…if I am getting the entire Data Story?

…if it was analysed properly?

…if I can trust the conclusions and recommendations?

Every executive that is reliant on decision-making data presented to them by other people shares these doubts. If you don’t know how to ask the correct questions, parse the information in the replies correctly and follow-up with the right requests for more information you will forever be at the mercy of others. My experience is that people with responsibility do not enjoy that situation.

Without an impartial assessment of the Data Story they will not be able to satisfy themselves that the Data Story they are being told is the right one. Every big decision needs to be made with a greater element of faith than was intended.

untrustworthy.png

There are two basic elements to achieving an accurate Data Story. The first is the human, and the second is the technical.

  1. Human

Everything may be tickety-boo, the best, most loyal people, are giving you a perfect Data Story. If you know this to be true then stop reading now. Life is great. On the other hand, if you ever wonder then keep reading.

(Type 1, Type 2, and Type 3 data - a recap here -  for clarity , I am writing about Type 2 and Type 3 data. Remember, Type 1 is the Mars Lander sort of stuff!)
  • “These results are from AI. It can do things we can’t.”

Whether the results are attributed to AI, which has spotted a very subtle pattern in a vast mass of data, or a straight survey designed, run and analysed by , means nothing in and of itself.

Even if an AI tool uses the best and the brightest to program the algorithms it ‘thinks and learns’ with, the fact remains that people – with all their attendant beliefs, prejudices, biases, agendas etc – set the rules, at least to start. If the machine has indeed learned by trial and error, it was still programmed by people. Therein lies the weakness.

human AI blend

This weakness comes from the initial decision makers, precisely because they aren’t you or your Board. The Board is likely to have a much wider range of experience and carry more responsibility than the Data Science/IT/Marketing departments.

How often have you spent time with these people? Are they even in the same office as you? How old are they? What are their social and political biases? And so on. Unless you know this then how can you begin to understand anything about the initial algorithms that started the AI going. When were they written, what was the market like then, by whom, in which country?

With all data collection and manipulation it is crucial to have a fuller story. It is the  background and understanding of those setting the questions, writing the algorithms, tweaking the machine learning, analysing the data, their managers, the instructions they have been given, the emphasis that this Data Story has received in the rest of the organisation before you see it. It is also insight into the marketplace provided by the sort of Thick Data that Tricia Wang and other ethnographers have popularised.

My message to you is that data is so much more than numbers. Just numbers can be misrepresent the story so greatly. We are social animals and as long as there are people involved in the production, analysis and presentation of data it doesn’t matter a jot how incredibly intelligent and fast the tools are. We are the weakness.

complicated employees

If you still struggle believing this concept then think about electronic espionage. It is rarely a failure in something mechanical that causes catastrophic breaches of security, it is the relative ease with which people can be compromised and share information. The people are the weak link.  In the very first days of hacking a chap called Kevin Mitnik in the US spoke of Social Engineering as the means to an end. We are all inherently flawed, these flaws are shaped and amplified by our social and work environments, so why couldn’t that affect the Data Story you get?

    2. Technical

  • “The data we have used is robust.”

I’ve heard that line trotted out many times. Gosh, where to start? It may be. Nonetheless, a lot can and does happen to the data before you see the pretty graph. Here are just a few things to consider before just agreeing with that assertion:

What was/were the hypothesis/hypotheses being tested?

Why?

When was it collected?

By whom (in-house or bought in from a third-party)?

Qualitative, quantitative, or a blend?

What was the method of collection (face to face interviews, Internet, watching and ticking boxes, survey, correlational, experimental, ethnographic, narrative,phenomenological, case study – you get the idea, there are more…)?

How was the study designed?

Who designed it?

How large was the sample(s)?

How was the data edited before analysis (by who, when, with what tools, any change logs etc, what questions were excluded and why)?

How was the data analysed (univariate, multivariate, logarithmic, what were the dummy variables and why, etc.)?

How is being presented to me, and why this way (scales, chart types, colouring, size, accompanying text etc)?

Research design

And so on. This is just a taste of the complexity behind the pretty pictures shown to you as part of the Data Story. From these manicured reports you are expected to make serious decisions that can have serious consequences.

You must ask yourself if you are happy knowing that the Data Story you get may be intentionally curated or unintentionally mangled. I started this site and the consultancy because I am an independent sceptic. In this age of data-driven decision-making you mustn’t forget. Incorrect data can’t take responsibility for mistakes, but you will be held to account. This is not scaremongering, it is simply fact.

If you need a discreet, reliable and sceptical  third-party to ask these questions then drop me an email.  I compile the answers or understand and highlight the gaps. You make the decisions, albeit far better informed and with the ability to show that you didn’t take the proffered Data Story at face-value, but asked an expert to help you understand it.