Why is Data So Unreliable?

This post is about the information that is used by a supercomputer we all carry around with us (no, dummy, not your mobile phone. Not quite yet). It is about the idea that we all suffer from innumerable distortions and agendas. Even those who claim to have no agenda…well, that is the agenda. To pretend they are above it or haven’t got one is disingenuous and indicates a worrying lack of self-awareness.

This supercomputer is an organic machine, part of a larger organism with a wide range of sensors for input, it requires very careful maintenance, is easily damaged and easily corrupted, benefits from being made to regularly solve complex problems, has seemingly limitless storage capacity and is carried around on your shoulders.

In our rush to worship at the altar of AI, data, quantum computers, and machine learning we step neatly over the weirdness and utter unpredictability that this supercomputer (for ease of typing it shall henceforth be known as the brain) brings.

optical illusion
Two straight lines, bent by the brain.

Sure, you can reduce all the external inputs into a measurable thing. The number of photons that hit the retina, the loss caused by an aged lens, the sensitivity of our fingers, sense of balance, the speed at which we solve problems and so on. Despite possessing a brain in all its judgemental and unpredictable glory we seem desperate to quantify and measure everything possible. For with the surety that comes from turning every conceivable bit of input into a number, then surely it must be within our ken to calculate the output? And if we don’t get it right then we can re-examine the computational processing algorithms and refine them until we come to as close an output that the programmer(s) expect. You see, there will be some parameters set for an acceptable/realistic/likely outcome somewhere in the brief, and usually, the aim is that the output matches the expectation.

The brain starts to gallop ahead, and I suppose this is what intrigues the scientists trying to create AI when the capacity for the random social variables and the filters they create comes in. The ability of the brain to make the weirdest associations from two apparently random bits of data always astonishes me.

For example: 30 years ago I shared a flat with Oliver Reed’s nephew, who apparently strove to exceed or at least replicate the lifestyle of his uncle. I was persuaded to take a tab of acid (LSD), which I tore in half because it scared me but, hey, peer pressure. This had a startling effect on me and I had to sit alone as it felt as if snooker balls of thoughts were cannoning into one another and going off at funny angles. To this day, if I see a snooker game I am reminded of that moment. I imagine it will be a very very long time before there will be a computer that can make those sort of weird cognitive leaps.

Data can be as much social and experiential as pure numbers fed into STATA, R or SPSS and manipulated in various approved ways. The data coming from the brain’s sensors is reliably distorted through one or several social lenses. Are you rich, poor, foreign, insecure, angry, a victim of something, in a wheelchair, aspirational, impaired, with a neurological condition (I have MS – have had for 26y)? Perhaps you are clinging to the notion that you are utterly impartial and free from an agenda and thus right? That is a powerful filter, often producing feelings of self-righteous indignation that can’t always be adequately expressed in 280 characters.

trump-twitter-03
It doesn’t stop this fellow from trying though.

When you are designing research, analysing research, presenting research, having research presented to you then try to remember that your brain and the recipient’s brain have different filters. They may seem externally similar, but at some point, that same information will hit a unique filter and the gap between intent and understanding soon becomes apparent.

The human desire to avoid cognitive dissonance is strong and mismanaged leads people to do terrible things to try and ‘fix’ it. Thankfully, we have a great ability to bend data to fit our pre-conceived notions of what feels right or to fill voids with made-up data. We ALL do it. I believe that the most we can do is to open up the analysis to others who manifestly do not share a similar agenda. As independent as possible and are trained to look for inconsistency, in the accrual of data or the motives of those who have handled it before you see/hear it.

So-called facts in newspapers are a good place to start asking why and how. In a commercial environment, people bandy around poor data and try to cover it with the force of personality of seniority. BBC Radio 4 has an excellent series (available for download) called Thought Cages that deals with the vagaries of the human brain in an amusing and engaging way.

We all lie and deceive all the time. It is in our nature. Sometimes you need a different variety of deceiver to look into your world and help you identify the deceits. I can help you, so contact me via LinkedIn.

 

Red Flags & Sacred Cows

Here follows a cautionary tale. I name the culprit, not because I have an axe to grind or it is particularly unique, but it suits the example being made.

To repeat other posts on here: when someone starts quoting facts and figures at you and citing studies, it is entirely reasonable – and very sensible – to ask some probing questions. The figures are usually being used to sell you something. Be that an idea, credibility, services that the provider of the figures can also come and fix, at a price, naturally, or just in support of their existing position on a topic.

This entire topic is made much more challenging when very emotive topics are being commented on. Race, Gender, Diversity and Inclusion are today’s Sacred Cows. These topics always seem to make many people uncomfortable, whilst trying to appear as if they are just fine with it. They often deal with this by ensuring that they say nothing, thereby keeping their head below the parapet. An unintended consequence is that lack of enquiry means that statements with regard to the Sacred Cow go unchallenged.

labels

Twenty years ago there were few, if any, consultancies that were offering to help companies address issues that can arise as a result of various forms of discrimination. Many seem to think that because they are positioning themselves as experts in the field it puts them beyond reasonable criticism and examination. Please can someone help me understand why that elevates them beyond reasonable scrutiny and criticism?

A big problem with Sacred Cow topics is that any criticism of anything to do with them – in this case, the use/misuse of data – is tantamount to trying to undermine their very raison d’etre. It isn’t at all, it is all about the data. Data doesn’t care about any of these issues. To conflate the two seems as if it is a tactic to draw one’s eye away from the data and try and shame you into ceasing with the questions.

Where you should have a problem is when data is used to misrepresent issues. Whether intentionally or unintentionally, the mishandling of data can make problems appear very different from what they actually are. A simple example is in the analysis of raw data. If certain variables are not measured during collection and then controlled for during the analysis, or sometimes data collected in a specific area produces results that are then remarked upon and treated as a general finding with to qualifications added to them.

Back to the Red Flags though. The fact that it is a sensitive topic should prevent you from asking about the provenance of the data. If someone clasps their hand to their mouth and asks how could you possibly question a respected pillar of the industry, sometimes an author etc, then remind them about speaking truth to power.

Recently, I saw a post on LinkedIn from one of the founders of Pearn Kandola LLP Which read:

“A third (32%) of people who have witnessed racism at work take no action, and a shocking two-fifths (39%) of those said that this was because they feared the consequences of doing so*. If our workplaces are to become genuine places of safety, it’s vital that the government acts quickly to curb the use of NDAs to hide instances of harassment, whether it be racist, sexist or otherwise. RacismAtWork UnconsciousBias

*According to our own research at Pearn Kandola LLP

All well and good on the face of it. Nothing wrong with citing your own research, providing you can back it up. I was interested to learn more, so I asked if the research was published, what the sample size was, where and when it was collected etc? There has been no reply. Judging by many of the comments this has been accepted without criticism or interrogation by many, a worrying indication of a lack of critical thinking. Another area of concern when data is being reported and should also raise a little red flag in your mind is the use of words like shocking. I can only imagine this is to try and increase click through. It detracts from data and sounds more like a Daily Express ‘weather armageddon’ type headline.

Sacred Cow

If the data is robust they ought to be delighted to publish it and open it up to examination. After all, if it is robust enough to underpin public claims that are made then there is no reason why it ought not to be open to examination by a third party.

To question data means that you are thinking. Whatever the topic, there should be no Sacred Cows, especially not the data.

AI, ML & DL – A Bluffer’s Guide

AI, ML and DL are our attempts to get machines to think and learn in the way that we can. Get that right and you’ll take the power of the human multiplied a million-fold, to have a breathtakingly capable machine. Probably our new robot overlords but we’ll cover that later. Whilst I do not have any issue with these developments, and do believe it is both attainable and useful, we are not there yet. To date we have these incredibly fast calculators that are essentially linear and binary. These are our modern computers. There are boffins in labs developing non-linear and non-binary counting machines but they are not here yet. This means that we are left with the brute force approach to problem solving. Run the right algorithm (at least to start it is provided by a   human) and you can get the giant calculator to supply an answer, often the correct one but f not then it can learn from its mistakes, rewrite the algorithm and try again. (By the way: that is ML/DL in a nutshell) Machine learning and AI.jpg Here is a definition of ML: Machine learning is the study of algorithms and mathematical models that computer systems use to progressively improve their performance on a specific task. That’s it. It is a computer learning to improve and tweak it’s algorithm, based on trial and error. Just like we learn things. No difference. Here is a definition for AI: Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. However, AI is where things can really come unstuck. The aim is to get machines to think as we do. In a non-linear way. Human beings deal exceptionally well with ambiguity and we have an ability to match things up like apparently different words and images. Have you ever been transported back in time, in an instant, by a song clip or a smell? That is  human, no one taught you to do that. A computer could conceivably do that but only if it had previously been instructed to do so. It can do it so very fast you would be forgiven for thinking it was natural. It is not though, it is programmed to do it. Sure, it might have learnt to improve its own algorithm (Machine Learning again) to do that based on observations of human behaviour. It is still just mimicking what it sees as the appropriate behaviour, there has never been that spontaneous connection that you experienced that transported you to another time and place, even fleetingly. A recent high-profile example of AI and ML going a little bit awry and showing bias is in this article here. “Amazon Reportedly Killed an AI Recruitment System Because It Couldn’t Stop the Tool from Discriminating Against Women“ Well worth listening to the video and understanding the unconscious bias exhibited by the builders of the algorithms. There are efforts to remove the human biases that the machines learn from and perpetuate. But what is Deep Learning, I hear you cry? It  can simply be differentiated from Machine Learning as when the need for a human being to categorise all the different data inputs is eliminated. Now the machine (still only  the really fast calculator). Think self-driving cars, drones and many more much duller things. Presently, we humans need to be involved in the categorisation. There is even a Data Labelling factory in China to use humans to ‘teach’ machines what it is  that they are seeing. Equitable, Just, Neutral and Fair are components of moral behaviour that reside in the interpretation of the present societal norms, and not everyone agrees with them. Different cultures can have quite different views on a correct moral choice. Remember this when someone is trying to argue about the infallibility of computers. They can only be programmed with lagging data and they will always reflect us and our biases. For better or worse. bias see-saw.jpg

Data Ethics For Business

We exist in an increasingly data driven world. More and more, we are encouraged or directed to ‘listen to the data’ above all else. After all, the data doesn’t lie. Does it?

bigdatawordmap-1264x736-672x372

Data Ethics in business is the name of the practice used to ensure that the data being used to make high-value commercial decisions is of the highest quality possible. However, there is a catch. Human beings are the catch. We have  gut-instinct, prejudices, experience, belief systems, conditioning, ego, expectation, deceit, vested interests etc. These behavioural biases all stand to cloud the data story, and usually do.

A high-value commercial decision does not necessarily have immediate financial consequences. Although, in commercial terms, a sub-optimal outcome is invariably linked with financial loss. In the first instance, the immediate effects of a high-value decision can be on organisational morale or have reputational consequences.

responsibility

When a high-value decision is to be made there are invariably advocates and detractors. Both camps like to believe that they are acting in the service of a cause greater than themselves. Occasionally, some of the actors cloud the story because their self-interest is what really matters to them, and they try hard to mask that with the veneer of the greater good. Hence the term ‘Data Story’, because behind the bare numbers and pretty graphics  there is an entire story.

The concept of conducting a pre-mortem examination of the entire data story to model what can go wrong is becoming more important for senior decision makers. It is getting increasingly difficult to use the traditional internally appointed devil’s advocate as, due to the inherent complexity of understanding a data story, this function needs to be performed by subject matter experts. Although the responsibility for decision-making always falls on the Senior Management, they want to do it with a full breakdown of the many facets of the data story.

BigData-wordcloud-2

 

In order to achieve this, individuals with a unique blend of talents, experience and inquisitiveness must be used. People with absolute objectivity and discretion, who don’t rely on inductive reasoning. Ones who are robust enough to operate independently, diplomatically and discreetly and have executive backing to interrogate all the data sources, ask the difficult questions and highlight any gaps, inconsistencies, irregularities. From this they can provide a report for the Executive Sponsor(s) with questions to ask and inquiries to make so a well-informed decision can be made.

After all, when there is  lots at stake, no one wants to be remembered as the person that screwed-up and tried to blame the data?