How Do I Know…

…if I am getting the entire Data Story?

…if it was analysed properly?

…if I can trust the conclusions and recommendations?

Every executive that is reliant on decision-making data presented to them by other people shares these doubts. If you don’t know how to ask the correct questions, parse the information in the replies correctly and follow-up with the right requests for more information you will forever be at the mercy of others. My experience is that people with responsibility do not enjoy that situation.

Without an impartial assessment of the Data Story they will not be able to satisfy themselves that the Data Story they are being told is the right one. Every big decision needs to be made with a greater element of faith than was intended.

untrustworthy.png

There are two basic elements to achieving an accurate Data Story. The first is the human, and the second is the technical.

  1. Human

Everything may be tickety-boo, the best, most loyal people, are giving you a perfect Data Story. If you know this to be true then stop reading now. Life is great. On the other hand, if you ever wonder then keep reading.

(Type 1, Type 2, and Type 3 data - a recap here -  for clarity , I am writing about Type 2 and Type 3 data. Remember, Type 1 is the Mars Lander sort of stuff!)
  • “These results are from AI. It can do things we can’t.”

Whether the results are attributed to AI, which has spotted a very subtle pattern in a vast mass of data, or a straight survey designed, run and analysed by , means nothing in and of itself.

Even if an AI tool uses the best and the brightest to program the algorithms it ‘thinks and learns’ with, the fact remains that people – with all their attendant beliefs, prejudices, biases, agendas etc – set the rules, at least to start. If the machine has indeed learned by trial and error, it was still programmed by people. Therein lies the weakness.

human AI blend

This weakness comes from the initial decision makers, precisely because they aren’t you or your Board. The Board is likely to have a much wider range of experience and carry more responsibility than the Data Science/IT/Marketing departments.

How often have you spent time with these people? Are they even in the same office as you? How old are they? What are their social and political biases? And so on. Unless you know this then how can you begin to understand anything about the initial algorithms that started the AI going. When were they written, what was the market like then, by whom, in which country?

With all data collection and manipulation it is crucial to have a fuller story. It is the  background and understanding of those setting the questions, writing the algorithms, tweaking the machine learning, analysing the data, their managers, the instructions they have been given, the emphasis that this Data Story has received in the rest of the organisation before you see it. It is also insight into the marketplace provided by the sort of Thick Data that Tricia Wang and other ethnographers have popularised.

My message to you is that data is so much more than numbers. Just numbers can be misrepresent the story so greatly. We are social animals and as long as there are people involved in the production, analysis and presentation of data it doesn’t matter a jot how incredibly intelligent and fast the tools are. We are the weakness.

complicated employees

If you still struggle believing this concept then think about electronic espionage. It is rarely a failure in something mechanical that causes catastrophic breaches of security, it is the relative ease with which people can be compromised and share information. The people are the weak link.  In the very first days of hacking a chap called Kevin Mitnik in the US spoke of Social Engineering as the means to an end. We are all inherently flawed, these flaws are shaped and amplified by our social and work environments, so why couldn’t that affect the Data Story you get?

    2. Technical

  • “The data we have used is robust.”

I’ve heard that line trotted out many times. Gosh, where to start? It may be. Nonetheless, a lot can and does happen to the data before you see the pretty graph. Here are just a few things to consider before just agreeing with that assertion:

What was/were the hypothesis/hypotheses being tested?

Why?

When was it collected?

By whom (in-house or bought in from a third-party)?

Qualitative, quantitative, or a blend?

What was the method of collection (face to face interviews, Internet, watching and ticking boxes, survey, correlational, experimental, ethnographic, narrative,phenomenological, case study – you get the idea, there are more…)?

How was the study designed?

Who designed it?

How large was the sample(s)?

How was the data edited before analysis (by who, when, with what tools, any change logs etc, what questions were excluded and why)?

How was the data analysed (univariate, multivariate, logarithmic, what were the dummy variables and why, etc.)?

How is being presented to me, and why this way (scales, chart types, colouring, size, accompanying text etc)?

Research design

And so on. This is just a taste of the complexity behind the pretty pictures shown to you as part of the Data Story. From these manicured reports you are expected to make serious decisions that can have serious consequences.

You must ask yourself if you are happy knowing that the Data Story you get may be intentionally curated or unintentionally mangled. I started this site and the consultancy because I am an independent sceptic. In this age of data-driven decision-making you mustn’t forget. Incorrect data can’t take responsibility for mistakes, but you will be held to account. This is not scaremongering, it is simply fact.

If you need a discreet, reliable and sceptical  third-party to ask these questions then drop me an email.  I compile the answers or understand and highlight the gaps. You make the decisions, albeit far better informed and with the ability to show that you didn’t take the proffered Data Story at face-value, but asked an expert to help you understand it.

 

 

Advertisements

Type 3 data in action. The Guardian is at it again.

The purpose of this blog is to get behind the data stories we encounter. Understandably, most commercial data is sensitive and remains unpublished. This means I have to rely on publicly available mangling of the data to illustrate the points.

The article of 11th October 2018 carries the snappy title, “Profits slide at big six energy firms as 1.4m customers switch” (The 3 types of data are explained here)

I will stick to the problems with data and not make this a critique af the article, for its weaknesses alone. That is just churlish. Read the following and think of yourself being presented with a document like this and having to critique its worth as something to base your decision-making on.

This article encompasses the Type 3 data example so very well! It appears that the journalist has started with an idea and then worked backwards to mangle what Type 1 data they have to fit the idea they want to transmit to the reader. To be clear: this post is not written an opinion piece about the Guardian, but a critique of an article purporting to use Type 1 data  to support the ‘Sliding Profits’ hypothesis.

Before we go any further the Golden Rule of data has been broken. You simply mustn’t decide the answer, and then try to manipulate, mangle and torture the data to fit your conclusion. You must be led by the data, not the other way round. It is fine to start with a hypothesis and then test the data to see if that is true. It is a major credibility red flag when the conclusion is actually the initially assumed answer.

Red Flag

If the article is apparently a business article it is rather worrying when the journalist obviously doesn’t know the difference between profit margins and profit¹. These are two distinctly different ideas yet they are used interchangeably in the piece. Red flag number two (if the first wasn’t enough). Paragraph five manages to combine the margin’s of two companies with the profits of another and then – completely randomly – plugs in (excuse the pun) an apparently random reference to a merger and the Competition Commission.

Terms like the ‘Big Six’ are used but nowhere does the author bother to say who the Big Six are. Whilst it is a moderately common term it cannot be assumed that everyone knows who they are. This is sloppy reportage and another Red Flag for the reader. Sloppy here, sloppy elsewhere. Who knows? This is back to the Type 3 issue of how it is presented to you. In this case, so far, very poorly.

The energy market regulator, Ofgem, is cited as the source for the first graphic. The Y (vertical) axis is numbered with no qualification, the date and document that this is taken from isn’t mentioned. Type 1 data being mangled by the Type 3 data. Overall – poor sourcing and not worth the bother. You can dismiss graphics like this as you can reasonably assume it is a form of visual semiotic designed to elicit a feeling and not communicate any reliable Type 1 data to you. (Note the profits and profit margins even being conflated in the graphic title!)

Poor graphic.JPG
Poor graphic designed to mislead – taken from the Guardian article.

 

The final critique is the one that speaks to the concept of Type 3 data. The language used in the article is such a blatant attempt to skew the article away from reportage about how the entrant of challengers into the market place are affecting the profits, and profit margins, of the established players. I think the subsidiary point is about the fact that consumers aren’t switching suppliers as much as is expected. I had to read the article several times to distil those as the most likely objectives of the piece.

Finally, if you re-read the article and just look at the tone and, more specifically, the adjectives used you’ll be surprised. What I can’t work out is the author’s agenda. To just report such a muddle of data is one thing, most popular press has an agenda of some kind.

NB: I really hope the Guardian doesn’t just keep gifting such poorly written articles. I think I may look at the coconut oil debate next!

Continue reading “Type 3 data in action. The Guardian is at it again.”