Why is Data So Unreliable?

This post is about the information that is used by a supercomputer we all carry around with us (no, dummy, not your mobile phone. Not quite yet). It is about the idea that we all suffer from innumerable distortions and agendas. Even those who claim to have no agenda…well, that is the agenda. To pretend they are above it or haven’t got one is disingenuous and indicates a worrying lack of self-awareness.

This supercomputer is an organic machine, part of a larger organism with a wide range of sensors for input, it requires very careful maintenance, is easily damaged and easily corrupted, benefits from being made to regularly solve complex problems, has seemingly limitless storage capacity and is carried around on your shoulders.

In our rush to worship at the altar of AI, data, quantum computers, and machine learning we step neatly over the weirdness and utter unpredictability that this supercomputer (for ease of typing it shall henceforth be known as the brain) brings.

optical illusion
Two straight lines, bent by the brain.

Sure, you can reduce all the external inputs into a measurable thing. The number of photons that hit the retina, the loss caused by an aged lens, the sensitivity of our fingers, sense of balance, the speed at which we solve problems and so on. Despite possessing a brain in all its judgemental and unpredictable glory we seem desperate to quantify and measure everything possible. For with the surety that comes from turning every conceivable bit of input into a number, then surely it must be within our ken to calculate the output? And if we don’t get it right then we can re-examine the computational processing algorithms and refine them until we come to as close an output that the programmer(s) expect. You see, there will be some parameters set for an acceptable/realistic/likely outcome somewhere in the brief, and usually, the aim is that the output matches the expectation.

The brain starts to gallop ahead, and I suppose this is what intrigues the scientists trying to create AI when the capacity for the random social variables and the filters they create comes in. The ability of the brain to make the weirdest associations from two apparently random bits of data always astonishes me.

For example: 30 years ago I shared a flat with Oliver Reed’s nephew, who apparently strove to exceed or at least replicate the lifestyle of his uncle. I was persuaded to take a tab of acid (LSD), which I tore in half because it scared me but, hey, peer pressure. This had a startling effect on me and I had to sit alone as it felt as if snooker balls of thoughts were cannoning into one another and going off at funny angles. To this day, if I see a snooker game I am reminded of that moment. I imagine it will be a very very long time before there will be a computer that can make those sort of weird cognitive leaps.

Data can be as much social and experiential as pure numbers fed into STATA, R or SPSS and manipulated in various approved ways. The data coming from the brain’s sensors is reliably distorted through one or several social lenses. Are you rich, poor, foreign, insecure, angry, a victim of something, in a wheelchair, aspirational, impaired, with a neurological condition (I have MS – have had for 26y)? Perhaps you are clinging to the notion that you are utterly impartial and free from an agenda and thus right? That is a powerful filter, often producing feelings of self-righteous indignation that can’t always be adequately expressed in 280 characters.

It doesn’t stop this fellow from trying though.

When you are designing research, analysing research, presenting research, having research presented to you then try to remember that your brain and the recipient’s brain have different filters. They may seem externally similar, but at some point, that same information will hit a unique filter and the gap between intent and understanding soon becomes apparent.

The human desire to avoid cognitive dissonance is strong and mismanaged leads people to do terrible things to try and ‘fix’ it. Thankfully, we have a great ability to bend data to fit our pre-conceived notions of what feels right or to fill voids with made-up data. We ALL do it. I believe that the most we can do is to open up the analysis to others who manifestly do not share a similar agenda. As independent as possible and are trained to look for inconsistency, in the accrual of data or the motives of those who have handled it before you see/hear it.

So-called facts in newspapers are a good place to start asking why and how. In a commercial environment, people bandy around poor data and try to cover it with the force of personality of seniority. BBC Radio 4 has an excellent series (available for download) called Thought Cages that deals with the vagaries of the human brain in an amusing and engaging way.

We all lie and deceive all the time. It is in our nature. Sometimes you need a different variety of deceiver to look into your world and help you identify the deceits. I can help you, so contact me via LinkedIn.



AI, ML & DL – A Bluffer’s Guide

AI, ML and DL are our attempts to get machines to think and learn in the way that we can. Get that right and you’ll take the power of the human multiplied a million-fold, to have a breathtakingly capable machine. Probably our new robot overlords but we’ll cover that later. Whilst I do not have any issue with these developments, and do believe it is both attainable and useful, we are not there yet. To date we have these incredibly fast calculators that are essentially linear and binary. These are our modern computers. There are boffins in labs developing non-linear and non-binary counting machines but they are not here yet. This means that we are left with the brute force approach to problem solving. Run the right algorithm (at least to start it is provided by a   human) and you can get the giant calculator to supply an answer, often the correct one but f not then it can learn from its mistakes, rewrite the algorithm and try again. (By the way: that is ML/DL in a nutshell) Machine learning and AI.jpg Here is a definition of ML: Machine learning is the study of algorithms and mathematical models that computer systems use to progressively improve their performance on a specific task. That’s it. It is a computer learning to improve and tweak it’s algorithm, based on trial and error. Just like we learn things. No difference. Here is a definition for AI: Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. However, AI is where things can really come unstuck. The aim is to get machines to think as we do. In a non-linear way. Human beings deal exceptionally well with ambiguity and we have an ability to match things up like apparently different words and images. Have you ever been transported back in time, in an instant, by a song clip or a smell? That is  human, no one taught you to do that. A computer could conceivably do that but only if it had previously been instructed to do so. It can do it so very fast you would be forgiven for thinking it was natural. It is not though, it is programmed to do it. Sure, it might have learnt to improve its own algorithm (Machine Learning again) to do that based on observations of human behaviour. It is still just mimicking what it sees as the appropriate behaviour, there has never been that spontaneous connection that you experienced that transported you to another time and place, even fleetingly. A recent high-profile example of AI and ML going a little bit awry and showing bias is in this article here. “Amazon Reportedly Killed an AI Recruitment System Because It Couldn’t Stop the Tool from Discriminating Against Women“ Well worth listening to the video and understanding the unconscious bias exhibited by the builders of the algorithms. There are efforts to remove the human biases that the machines learn from and perpetuate. But what is Deep Learning, I hear you cry? It  can simply be differentiated from Machine Learning as when the need for a human being to categorise all the different data inputs is eliminated. Now the machine (still only  the really fast calculator). Think self-driving cars, drones and many more much duller things. Presently, we humans need to be involved in the categorisation. There is even a Data Labelling factory in China to use humans to ‘teach’ machines what it is  that they are seeing. Equitable, Just, Neutral and Fair are components of moral behaviour that reside in the interpretation of the present societal norms, and not everyone agrees with them. Different cultures can have quite different views on a correct moral choice. Remember this when someone is trying to argue about the infallibility of computers. They can only be programmed with lagging data and they will always reflect us and our biases. For better or worse. bias see-saw.jpg