Data Done Properly

In contrast to all the mangled, misinterpreted, unpublished and skewed data there are beacons of hope. This time it emanates from none other than Local Government, not known for leadership in these areas. So pleasing to see.

I saw this LinkedIn post by Steven Johnson and immediately commented. It is such a delight to see data being used properly, from gathering to analysis.

 

Annotation 2019-03-19 074815

That is all. Have a wonderful Tuesday.

 

PS: No conflict of interest as I have never even met Steven!

Advertisements

Red Flags & Sacred Cows

Here follows a cautionary tale. I name the culprit, not because I have an axe to grind or it is particularly unique, but it suits the example being made.

To repeat other posts on here: when someone starts quoting facts and figures at you and citing studies, it is entirely reasonable – and very sensible – to ask some probing questions. The figures are usually being used to sell you something. Be that an idea, credibility, services that the provider of the figures can also come and fix, at a price, naturally, or just in support of their existing position on a topic.

This entire topic is made much more challenging when very emotive topics are being commented on. Race, Gender, Diversity and Inclusion are today’s Sacred Cows. These topics always seem to make many people uncomfortable, whilst trying to appear as if they are just fine with it. They often deal with this by ensuring that they say nothing, thereby keeping their head below the parapet. An unintended consequence is that lack of enquiry means that statements with regard to the Sacred Cow go unchallenged.

labels

Twenty years ago there were few, if any, consultancies that were offering to help companies address issues that can arise as a result of various forms of discrimination. Many seem to think that because they are positioning themselves as experts in the field it puts them beyond reasonable criticism and examination. Please can someone help me understand why that elevates them beyond reasonable scrutiny and criticism?

A big problem with Sacred Cow topics is that any criticism of anything to do with them – in this case, the use/misuse of data – is tantamount to trying to undermine their very raison d’etre. It isn’t at all, it is all about the data. Data doesn’t care about any of these issues. To conflate the two seems as if it is a tactic to draw one’s eye away from the data and try and shame you into ceasing with the questions.

Where you should have a problem is when data is used to misrepresent issues. Whether intentionally or unintentionally, the mishandling of data can make problems appear very different from what they actually are. A simple example is in the analysis of raw data. If certain variables are not measured during collection and then controlled for during the analysis, or sometimes data collected in a specific area produces results that are then remarked upon and treated as a general finding with to qualifications added to them.

Back to the Red Flags though. The fact that it is a sensitive topic should prevent you from asking about the provenance of the data. If someone clasps their hand to their mouth and asks how could you possibly question a respected pillar of the industry, sometimes an author etc, then remind them about speaking truth to power.

Recently, I saw a post on LinkedIn from one of the founders of Pearn Kandola LLP Which read:

“A third (32%) of people who have witnessed racism at work take no action, and a shocking two-fifths (39%) of those said that this was because they feared the consequences of doing so*. If our workplaces are to become genuine places of safety, it’s vital that the government acts quickly to curb the use of NDAs to hide instances of harassment, whether it be racist, sexist or otherwise. RacismAtWork UnconsciousBias

*According to our own research at Pearn Kandola LLP

All well and good on the face of it. Nothing wrong with citing your own research, providing you can back it up. I was interested to learn more, so I asked if the research was published, what the sample size was, where and when it was collected etc? There has been no reply. Judging by many of the comments this has been accepted without criticism or interrogation by many, a worrying indication of a lack of critical thinking. Another area of concern when data is being reported and should also raise a little red flag in your mind is the use of words like shocking. I can only imagine this is to try and increase click through. It detracts from data and sounds more like a Daily Express ‘weather armageddon’ type headline.

Sacred Cow

If the data is robust they ought to be delighted to publish it and open it up to examination. After all, if it is robust enough to underpin public claims that are made then there is no reason why it ought not to be open to examination by a third party.

To question data means that you are thinking. Whatever the topic, there should be no Sacred Cows, especially not the data.

AI, ML & DL – A Bluffer’s Guide

AI, ML and DL are our attempts to get machines to think and learn in the way that we can. Get that right and you’ll take the power of the human multiplied a million-fold, to have a breathtakingly capable machine. Probably our new robot overlords but we’ll cover that later. Whilst I do not have any issue with these developments, and do believe it is both attainable and useful, we are not there yet. To date we have these incredibly fast calculators that are essentially linear and binary. These are our modern computers. There are boffins in labs developing non-linear and non-binary counting machines but they are not here yet. This means that we are left with the brute force approach to problem solving. Run the right algorithm (at least to start it is provided by a   human) and you can get the giant calculator to supply an answer, often the correct one but f not then it can learn from its mistakes, rewrite the algorithm and try again. (By the way: that is ML/DL in a nutshell) Machine learning and AI.jpg Here is a definition of ML: Machine learning is the study of algorithms and mathematical models that computer systems use to progressively improve their performance on a specific task. That’s it. It is a computer learning to improve and tweak it’s algorithm, based on trial and error. Just like we learn things. No difference. Here is a definition for AI: Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. However, AI is where things can really come unstuck. The aim is to get machines to think as we do. In a non-linear way. Human beings deal exceptionally well with ambiguity and we have an ability to match things up like apparently different words and images. Have you ever been transported back in time, in an instant, by a song clip or a smell? That is  human, no one taught you to do that. A computer could conceivably do that but only if it had previously been instructed to do so. It can do it so very fast you would be forgiven for thinking it was natural. It is not though, it is programmed to do it. Sure, it might have learnt to improve its own algorithm (Machine Learning again) to do that based on observations of human behaviour. It is still just mimicking what it sees as the appropriate behaviour, there has never been that spontaneous connection that you experienced that transported you to another time and place, even fleetingly. A recent high-profile example of AI and ML going a little bit awry and showing bias is in this article here. “Amazon Reportedly Killed an AI Recruitment System Because It Couldn’t Stop the Tool from Discriminating Against Women“ Well worth listening to the video and understanding the unconscious bias exhibited by the builders of the algorithms. There are efforts to remove the human biases that the machines learn from and perpetuate. But what is Deep Learning, I hear you cry? It  can simply be differentiated from Machine Learning as when the need for a human being to categorise all the different data inputs is eliminated. Now the machine (still only  the really fast calculator). Think self-driving cars, drones and many more much duller things. Presently, we humans need to be involved in the categorisation. There is even a Data Labelling factory in China to use humans to ‘teach’ machines what it is  that they are seeing. Equitable, Just, Neutral and Fair are components of moral behaviour that reside in the interpretation of the present societal norms, and not everyone agrees with them. Different cultures can have quite different views on a correct moral choice. Remember this when someone is trying to argue about the infallibility of computers. They can only be programmed with lagging data and they will always reflect us and our biases. For better or worse. bias see-saw.jpg

Algorithms – A Bluffers Guide

A breakdown and simplification of some current tech speak

The word ‘algorithm’ is uttered with a degree of reverence these days. It is the black magic behind AI and Machine Learning and is a favourite thing to go rogue in a modern plot line. The actors merely blame the bad algorithm when a computer goes crazy in a dystopian sci-fi catastrophe.   The decision making requirements that we are faced with in the modern commercial world far exceed our capacity in many instances because our brains evolved for a very different sort of world. A world of small groups where we rarely met anyone very  different from ourselves. We had significantly shorter lives and our main priorities were sex and survival. These days there is hugely increased complexity and nuance yet the evolved desire for rapid choice-making hasn’t left us.  Faced with these pressures we turn to computers for help. Computers helping humans is so pervasive and permeates almost all aspects of life. Such a rapid change has occurred in the last lifetime as the evolution of computing capacity increases exponentially. Your mobile telephone has vastly greater computing power than all three computers on the first Space Shuttle. Think about it for a moment. Your phone possesses all the computing power required to fire you into space. This incredible capability means that people have been fascinated with the idea that a computer can be turned from a dumb machine into a thinking machine (thinking as we do) since the dawn of the first machine. However, computational power is one thing. How to make it work as an independent thinking machine is another thing all together. One of the key things you need to do this you need an algorithm.

Algorithms: the rules needed for machine thinking. 

Algorithm Just to clear this up. Machines DO NOT think. Computers can process a huge volume of information, really really quickly because they are unbelievably fast calculators. The hardware is just a superfast counting machine with a screen. Algorithms are not hard to conceive, if you think of them like this; an algorithm is what you need to cook supper for your family. Few families eat the same thing for every meal of every day so there are constraints and variables. Imagine there are four of you. One is a vegetarian, one is on a low-fat diet and the other too aren’t that fussy but do have preferences. You want to provide them with a nutritious and tasty meal that ensures everyone enjoys the experience, including you.   Let’s imagine that you are 45 and have cooked for the same people many times before (almost daily) and as a consequence you have learnt a lot about what works and what doesn’t. However, this week is different and you haven’t had time to shop and the other three did the shopping for you. You open the cupboard doors and have a peer in the fridge and freezer to get an idea of what is available for you to cook with. Within about 30 seconds of taking stock of the cupboard contents, the fridge contents, the available utensils to cook with, any time constraints, the dietary preferences and so on you decide on a meal. You cook it, serve it and everyone eats. They get up from the table appropriately nourished leaving the process to be repeated the next day. What allowed you to do this was an algorithm in you head. Call it the ‘cooking for family’ algorithm.  Algorith wordcloud   Pause for a moment though and think about how simple it can sound and actually how the thinking and actions required was so incredibly, amazingly, mind-blowingly complex and nuanced.

 A quick note as to where this can go wrong

Simply put, computers are not people. Computers are superb for making decisions that do not require any emotion, ethics, bias and the like. Eventually a computer beat a Chess Grandmaster and uit did it by sheer computational brute force. However, to take the supper example: the cook knows the audience at a level a computer can’t match. All the calculations from an algortihm and it can’t know from someone’s face if they are the special kind of tired that a Wednesday can make them, so putting any kind of pie down for dessert would mean the world to them. And the others would see that a pie was not only what was needed but was a very thoughtful gesture thereby elevating the cook in the eyes of the other three and making an intangible but felt contribution to them too.  The aim is to have algorithms teach themselves by learning from mistakes in order to achieve the desired outcome of the programmer(s). They try,  but they are far from perfect and because we expect perfection from computers, in a way that is different from our expectations of one another, then mistakes are not easily forgiven. Algorithm 2

Data Ethics For Business

We exist in an increasingly data driven world. More and more, we are encouraged or directed to ‘listen to the data’ above all else. After all, the data doesn’t lie. Does it?

bigdatawordmap-1264x736-672x372

Data Ethics in business is the name of the practice used to ensure that the data being used to make high-value commercial decisions is of the highest quality possible. However, there is a catch. Human beings are the catch. We have  gut-instinct, prejudices, experience, belief systems, conditioning, ego, expectation, deceit, vested interests etc. These behavioural biases all stand to cloud the data story, and usually do.

A high-value commercial decision does not necessarily have immediate financial consequences. Although, in commercial terms, a sub-optimal outcome is invariably linked with financial loss. In the first instance, the immediate effects of a high-value decision can be on organisational morale or have reputational consequences.

responsibility

When a high-value decision is to be made there are invariably advocates and detractors. Both camps like to believe that they are acting in the service of a cause greater than themselves. Occasionally, some of the actors cloud the story because their self-interest is what really matters to them, and they try hard to mask that with the veneer of the greater good. Hence the term ‘Data Story’, because behind the bare numbers and pretty graphics  there is an entire story.

The concept of conducting a pre-mortem examination of the entire data story to model what can go wrong is becoming more important for senior decision makers. It is getting increasingly difficult to use the traditional internally appointed devil’s advocate as, due to the inherent complexity of understanding a data story, this function needs to be performed by subject matter experts. Although the responsibility for decision-making always falls on the Senior Management, they want to do it with a full breakdown of the many facets of the data story.

BigData-wordcloud-2

 

In order to achieve this, individuals with a unique blend of talents, experience and inquisitiveness must be used. People with absolute objectivity and discretion, who don’t rely on inductive reasoning. Ones who are robust enough to operate independently, diplomatically and discreetly and have executive backing to interrogate all the data sources, ask the difficult questions and highlight any gaps, inconsistencies, irregularities. From this they can provide a report for the Executive Sponsor(s) with questions to ask and inquiries to make so a well-informed decision can be made.

After all, when there is  lots at stake, no one wants to be remembered as the person that screwed-up and tried to blame the data?

Why is data dangerous?

In the words of @RorySutherland: “The data made me do it” is the 21st Century equivalent of “I was only obeying orders”. The growing power and influence of Data Science touches everyone’s lives. Sutherland also remarks: “Markets are complex and there can be more than one right answer. People in business prefer the pretence of ‘definitive’ because if you can show you’ve done the ‘only right thing’ you have covered yourself in event of failure”. These are all attempts at Plausible Deniability, and they are weak.

For the record, plain old data is not dangerous, you are unlikely to be hit by an errant Spearmans Rho, or a rogue Control variable that detached itself from an analysis. Data is just a record of the measurable values of something that has happened in the past. Digital exhaust, if you will. Like speed in a car, it is the inappropriate use of it that causes issues.

zuck-data

Doing the right thing often sees people becoming  enslaved to Type 1 and Type 2 data, because they are the easy parts. You can hire experts, who can count well, use the software and understand how to tease out knowledge from the data points. What the majority can’t do, or may even do intentionally, is to manipulate the presentation, context and language used when presenting their findings. This is the Type 3 data I talk about, that isn’t traditional data as we know it.

Type 3 data is the really dangerous stuff. The reason for this is our complete fallibility as human beings. This is nothing to be ashamed of, it is how we are made and conditioned. It is in fact, entirely, boringly, and ordinarily normal. I was recently told by a lawyer – I say this because she is pretty well-educated – that all statistics are a lie. She then cited the famous Mark Twain (nicked from Disraeli) saying of, “There are lies, damn lies and statistics”, as if this were all the proof she required. Interestingly, when I challenged her on this and made a case for accurate uses of statistics she refused to even acknowledge this. She was wedded to her belief and I must be wrong. Case closed.

statwordcloud

I think immersion in courtroom rhetoric may have been getting the better of her. However, this goes to show the just how dangerous we humans can be. Imagine being a client with a lawyer whose dogmatism may cause them to overlook or be able to question relevant statistical evidence? All stemming from a strongly held view that all statistics are lies. Professor Bobby Duffy recently wrote an excellent book called Perils of Perception and on p.100 he shows just how problematic this view can be.

My point is: If a person who is well-educated, and practising in a profession like law, can hold such a position, then it is not beyond any of us to do so, quite unwittingly. Until one is more familiar with the behavioural biases that we are all susceptible to, the way Type 1 and Type 2 data can be mis-represented (Type 3 data) and how that uses our in-built foibles to generate a reaction.

This is where someone who understands both of these areas, and can blend that knowledge into an expertise which is useful, can help you. When important decisions on strategy, direction and spending  are conditional on interpreting data from others, you want to get it right first time. If not, you’ll be forced into, “The data made me do it”, and that rarely ends well.

burning money

 

 

 

 

Another Meaningless Graphic: Another Meaningless ‘Fact’

Have you ever seen one of these? A classic example of an attempt to bamboozle you with utterly meaningless data.

This is from a website that, amongst many other things, promises to “outpace disruption“. Does anyone know what that means? Anyhow, here is the result of outpacing disruption.

meaningless histogram
A meaningless bar chart

This was all there was. There was no information giving context. Still, positive numbers must mean it is wonderful investment. You can hardly fail to make a bundle

Are you ready to part with your money yet? No?  How about if you knew this dazzling fact: what if I were to tell you that this product increases checkout speed (e-commerce) by 24%. Impressed yet?

Or perhaps, after you read the first posts on The Problem With Data you were asking things like, a 24% increase over what? How many? What period? Which currency? What language? How measured? Credit/Debit card? PayPal? Amazon Pay? Stored customer details? First-time transactions? Repeat transactions? Fibre broadband or 5meg FTTC, TCP to the residence? And on and on.

 

 

Type 3 data in action. The Guardian is at it again.

The purpose of this blog is to get behind the data stories we encounter. Understandably, most commercial data is sensitive and remains unpublished. This means I have to rely on publicly available mangling of the data to illustrate the points.

The article of 11th October 2018 carries the snappy title, “Profits slide at big six energy firms as 1.4m customers switch” (The 3 types of data are explained here)

I will stick to the problems with data and not make this a critique af the article, for its weaknesses alone. That is just churlish. Read the following and think of yourself being presented with a document like this and having to critique its worth as something to base your decision-making on.

This article encompasses the Type 3 data example so very well! It appears that the journalist has started with an idea and then worked backwards to mangle what Type 1 data they have to fit the idea they want to transmit to the reader. To be clear: this post is not written an opinion piece about the Guardian, but a critique of an article purporting to use Type 1 data  to support the ‘Sliding Profits’ hypothesis.

Before we go any further the Golden Rule of data has been broken. You simply mustn’t decide the answer, and then try to manipulate, mangle and torture the data to fit your conclusion. You must be led by the data, not the other way round. It is fine to start with a hypothesis and then test the data to see if that is true. It is a major credibility red flag when the conclusion is actually the initially assumed answer.

Red Flag

If the article is apparently a business article it is rather worrying when the journalist obviously doesn’t know the difference between profit margins and profit¹. These are two distinctly different ideas yet they are used interchangeably in the piece. Red flag number two (if the first wasn’t enough). Paragraph five manages to combine the margin’s of two companies with the profits of another and then – completely randomly – plugs in (excuse the pun) an apparently random reference to a merger and the Competition Commission.

Terms like the ‘Big Six’ are used but nowhere does the author bother to say who the Big Six are. Whilst it is a moderately common term it cannot be assumed that everyone knows who they are. This is sloppy reportage and another Red Flag for the reader. Sloppy here, sloppy elsewhere. Who knows? This is back to the Type 3 issue of how it is presented to you. In this case, so far, very poorly.

The energy market regulator, Ofgem, is cited as the source for the first graphic. The Y (vertical) axis is numbered with no qualification, the date and document that this is taken from isn’t mentioned. Type 1 data being mangled by the Type 3 data. Overall – poor sourcing and not worth the bother. You can dismiss graphics like this as you can reasonably assume it is a form of visual semiotic designed to elicit a feeling and not communicate any reliable Type 1 data to you. (Note the profits and profit margins even being conflated in the graphic title!)

Poor graphic.JPG
Poor graphic designed to mislead – taken from the Guardian article.

 

The final critique is the one that speaks to the concept of Type 3 data. The language used in the article is such a blatant attempt to skew the article away from reportage about how the entrant of challengers into the market place are affecting the profits, and profit margins, of the established players. I think the subsidiary point is about the fact that consumers aren’t switching suppliers as much as is expected. I had to read the article several times to distil those as the most likely objectives of the piece.

Finally, if you re-read the article and just look at the tone and, more specifically, the adjectives used you’ll be surprised. What I can’t work out is the author’s agenda. To just report such a muddle of data is one thing, most popular press has an agenda of some kind.

NB: I really hope the Guardian doesn’t just keep gifting such poorly written articles. I think I may look at the coconut oil debate next!

Continue reading “Type 3 data in action. The Guardian is at it again.”

What is Type 3 data and why is it so important?

A simple enough sounding question, though something that is quite contested. I propose that we need to look at three distinct subsets of the concept of data. You’ll see why in a moment why this article isn’t a technical explanation of data in stats. For that (and it is necessary) this is a super post that explains them.

This article is intended a guide to help you categorise the data that is being presented to you in the course of a day.

Type 1 – This is ‘just’ the hard numbers.

By this I mean just what you imagine. The figures that get plugged into SPSS, Stata, R, SAS and the like. How these are analysed determines the output. It is necessary – and can be mind-numbingly boring, I know this as I’ve had to do it many times! – to check how any of the variables may have been re-coded, re-weighted and then analysed in the data-management components (.do files, syntax files etc) of the popular stats packages. [Why isn’t Excel listed? I asked my ex-supervisor and a Professor who specialises in this stuff. He politely guffawed and told me that it isn’t a ‘proper’ statistical analysis program. Once the heavy lifting has been done it may be exported to Excel as that is what the majority of people are used to seeing.]

figures.png

Type 2 – This type of data is the so-called softer numbers.

Whereas the first type of data is  useful for analysing the patterns of turnout for an election, the way different materials on an aircraft fatigue, how people move through a supermarkets etc. Type 1 relies on quantifiable and easily measurable (converted into a numerical value for analysis) variables. One step right, turns right and two steps at a 40 degree angle, over a nine second period and so on.

Type 2 data is an attempt to record and analyse human emotions, behaviour, and sometimes capture the strength of intent to do or not do something. We have all been asked things like, “How did that make you feel? Please rate your reply from Very Unhappy, Unhappy, Neutral, Happy to Very Happy?” This is the classic Likert scale.

Stop though. Have you considered if Semantic Differential Scales were used instead? Perhaps a mixture of the two, or two different data sets derived using different assessment methodologies? These too can be plugged into the stats programs and analysed. The trickier thing here is the subjectivity element. Is my Very Unhappy the equivalent to your Very Unhappy. The way this effect is mitigated is by large-scale testing, as this generates a happy medium by excluding the outliers. Hence, be very wary when a small sample size is used to generate an indication of feeling or intent.

Likert answers

Type 3 – And this is where it gets hazy and interesting!

Type 3 data is the way in which data is framed and presented to you. This may be in a newspaper, an internal report or perhaps a sales presentation. They are all trying to sell you something. The wrapping of the data and analysis may be in a manner to enhance the credibility and believability of the package, or you may be being steered away from robust data because it doesn’t fit with someone’s agenda. Either way, you are being encouraged to buy in to a point of view and the ‘data’ is being used in an effort to burnish the idea.

Cleverly employed Visual Semiotics that speak to far deeper parts of our brain are often employed. You already know what these are, they’re the graphs, symbols and pie charts as well as the tangentially relevant accompanying images. See the recent post on the mangling of data by the Guardian newspaper – the image of the white police officer discharging a taser directly towards you – for an example of this. Creative affect labeling, which is the process of putting feelings into words, of some of the characteristics of the data, certainly the ones that focus is being directed towards, is influential. The latest research techniques have allowed scientists to show this happens, however you may think you can override such feelings.

Visual Semiotics

Although Type 3 data is all about the way in which the data is framed, it isn’t the numbers in the traditional sense. It is the third part of the package. Type 1 data is, if correctly produced and analysed, completely susceptible to the influence of Type 3 data, as is Type 2 data.

Type 3 data is the processing, packing and presentation of the digital exhaust that makes up Types 1 and 2 data. It is important as it mediates between us unpredictable humans, slaves to our emotions, with all our psychological foibles and weaknesses hidden just below the surface. As such,  Type 3 data should be afforded as much significance when analysing any data that is presented to us.

 

 

Data does not equal wisdom

It is natural to both fear the unknown and to feel the strong desire to allay that fear. After all, lack of insight and wisdom in both business and life can bring the best plans crashing down.

Big Data, Small Data, Thick Data, regression analyses, log analyses, control groups, p values, and significance – in this modern world of news, fake news, and endless statistics, we are constantly presented with numbers that are designed to give information an instant gloss of credibility. People often try to burnish their claims by saying things like, “scientifically proven”, “you can’t argue with the numbers”, “if you can’t measure it then it isn’t true”, and so on.

But the simple fact is that it is not that simple. There is something quasi-mystical in numbers, which makes them both instantly trustworthy and the perfect tool to bamboozle people. The trick is to look behind the numbers and understand what is being measured and how. Furthermore, some things, especially anything to do with human beings, are not easy to measure with ‘conventional’ statistics. For instance: how do you measure the strength and intensity of a feeling or an intention? It is not like calculating the re-entry criteria for a spacecraft, for physics doesn’t have feelings.

Data to Insight pyramid
Being at the top of the intellectual food chain can make us believe that we are best placed to see into this unknown, exploiting data to see what is really happening in the world around us. This belief is powerfully seductive. The solutions being sold to us prey not only on the fear of the unknown but also on the seduction of knowing. The mixture of loss-aversion allied to the availability heuristic that is marketed to a worried audience, often causes them to grab at the passing offerings in the belief that the silver bullet is in there somewhere.

As ‘mere’ beings we easily fall prey to the idea that we are masters of our universe; we use technology in the hope that it will allow us to control what we want to control. But the problems we face in exerting control don’t come from the technology. They come from us. We are blinded to our own fallibilities and mistake output for insight. We can get captured by the belief that the latest tech provides the truth, and is a legitimate insight into the future. The desire to believe this can often lead us to distort the data to fit our assumptions, and inevitably this also produces a distortion of reality. Famously, Nokia was warned by an ethnographer, using meticulously collected Thick Data, that the smartphone was coming. They insisted that this person was wrong because, they said, the information did not ‘fit the data’. We all know what happened to Nokia. Nokia who? (Credit to Tricia Wang for the Nokia story)

The analysis of data is a lagging indicator; it involves measuring the past, interpreting that past, and trying to predict the future, and that is a tough challenge. We conflate what we see with what we understand and how we think it should be. There is a reason that investment products carry the dire warnings about past success being no guarantee of future performance.

There is no doubt that we are much better at incorporating such ‘soft’ characteristics into measurement metrics. However, it is not as easy as a pure data-science approach. To build the effective tools (algorithms) requires a much more nuanced and wider understanding than that given by a blinkered approach which fails to incorporate Thick Data. This can only really be done by a multi-disciplinary team of people, whose skills might include behavioural science, ethnography, sociology, political science and psychology – to name just a few. Mathematicians, statisticians, data-analysts and programmers are certainly necessary, but it shouldn’t stop there.

It is often said that people are at the core of a business. Whether they are the customers or the staff, they are people, not machines. Knowing what people do is one thing, knowing why they do it is even more important. More importantly still is understanding why people are doing what they do. This requires much more information that merely what is being done and by whom. This is the context that only Thick Data brings.

Thick-Data-Info

Human beings do not act rationally and famously, they will lie like mad to researchers! Something shown in many studies about the issues in conducting studies on people. Additionally, they can rationalise their actions in a way that they are happy with. Knowing the value of how the social, societal and environmental factors influence the numbers is a step towards the sort of understanding that may have saved Nokia. For modern business leaders, who rely on data to inform their decisions, it is critical to understand the context of actions and the intentions that underpin the actions.

If you want to take the blinkered approach offered by an IT package and believe that it is a magic software tool will allow you to predict the future, then I would suggest that you are falling prey to unconscious bias. When that happens, you find things like the following flipchart starting to seem credible. In fact, I’ll wager that something similar was seen in Nokia shortly before they were wiped off the commercial map.

Think Rhino

Wisdom is understanding the limitations of the numbers alone: however they are crunched. Wisdom in business is understanding that it is not weakness to embrace wider ideas. Wisdom is strength, and this does not just come from data alone. Ultimately, wisdom comes from within, but the insights and context makers should be part of the mix.

If you are struggling with a business problem and you suspect that having a deeper understanding of how data works would be valuable then call me for a chat on Skype (domshadbolt) or  click here to email me.