Red Flags & Sacred Cows

Here follows a cautionary tale. I name the culprit, not because I have an axe to grind or it is particularly unique, but it suits the example being made.

To repeat other posts on here: when someone starts quoting facts and figures at you and citing studies, it is entirely reasonable – and very sensible – to ask some probing questions. The figures are usually being used to sell you something. Be that an idea, credibility, services that the provider of the figures can also come and fix, at a price, naturally, or just in support of their existing position on a topic.

This entire topic is made much more challenging when very emotive topics are being commented on. Race, Gender, Diversity and Inclusion are today’s Sacred Cows. These topics always seem to make many people uncomfortable, whilst trying to appear as if they are just fine with it. They often deal with this by ensuring that they say nothing, thereby keeping their head below the parapet. An unintended consequence is that lack of enquiry means that statements with regard to the Sacred Cow go unchallenged.

labels

Twenty years ago there were few, if any, consultancies that were offering to help companies address issues that can arise as a result of various forms of discrimination. Many seem to think that because they are positioning themselves as experts in the field it puts them beyond reasonable criticism and examination. Please can someone help me understand why that elevates them beyond reasonable scrutiny and criticism?

A big problem with Sacred Cow topics is that any criticism of anything to do with them – in this case, the use/misuse of data – is tantamount to trying to undermine their very raison d’etre. It isn’t at all, it is all about the data. Data doesn’t care about any of these issues. To conflate the two seems as if it is a tactic to draw one’s eye away from the data and try and shame you into ceasing with the questions.

Where you should have a problem is when data is used to misrepresent issues. Whether intentionally or unintentionally, the mishandling of data can make problems appear very different from what they actually are. A simple example is in the analysis of raw data. If certain variables are not measured during collection and then controlled for during the analysis, or sometimes data collected in a specific area produces results that are then remarked upon and treated as a general finding with to qualifications added to them.

Back to the Red Flags though. The fact that it is a sensitive topic should prevent you from asking about the provenance of the data. If someone clasps their hand to their mouth and asks how could you possibly question a respected pillar of the industry, sometimes an author etc, then remind them about speaking truth to power.

Recently, I saw a post on LinkedIn from one of the founders of Pearn Kandola LLP Which read:

“A third (32%) of people who have witnessed racism at work take no action, and a shocking two-fifths (39%) of those said that this was because they feared the consequences of doing so*. If our workplaces are to become genuine places of safety, it’s vital that the government acts quickly to curb the use of NDAs to hide instances of harassment, whether it be racist, sexist or otherwise. RacismAtWork UnconsciousBias

*According to our own research at Pearn Kandola LLP

All well and good on the face of it. Nothing wrong with citing your own research, providing you can back it up. I was interested to learn more, so I asked if the research was published, what the sample size was, where and when it was collected etc? There has been no reply. Judging by many of the comments this has been accepted without criticism or interrogation by many, a worrying indication of a lack of critical thinking. Another area of concern when data is being reported and should also raise a little red flag in your mind is the use of words like shocking. I can only imagine this is to try and increase click through. It detracts from data and sounds more like a Daily Express ‘weather armageddon’ type headline.

Sacred Cow

If the data is robust they ought to be delighted to publish it and open it up to examination. After all, if it is robust enough to underpin public claims that are made then there is no reason why it ought not to be open to examination by a third party.

To question data means that you are thinking. Whatever the topic, there should be no Sacred Cows, especially not the data.

Advertisements

Why Doesn’t Big Data Always = Good Data?

The Data Scientists out there will sigh as they feel that they have heard this a thousand times before. However, it is human beings that are the issue. Numbers are just numbers, it is what we humans do with them that is the issue.

Very quickly then; this is the correlation and causation argument writ large.

correlation causation.jpg
But it must be true???

Can you see the issue? On the face of it it makes sense. I prefer the elegance of expression of the original description of post hoc ergo propter hoc. Merely acquiring more and more data points, a bigger data set, better hardware, software and human expertise to manipulate this data does not equal better results from the data.

Big data is great and powerful when it is clean and accurate data. But….pause and think: before plunging into the analysis and insight phase the cleaning and tidying phase – the often skipped past boring stuff – needs to be complete. The crazy outliers need to be identified, partial data from a one source needs to be investigated, in the case of human surveys the ‘don’t know’ answers may be coded out, and so on.

There are a variety of ways to allow the Data Scientists to do this, but the heart of the matter is that if they are not given the time, tools and budget to do this then you are back to the junk in, junk out scenario that affects everything to do with computers.

As humans we are programmed in ways that really hamper us. This is especially true when we are operating outside of our field of expertise or are very out of date regarding a subject matter area. Our brains crave clarity and simplicity, we avoid the unknown as that is where danger may lie. We want to make as smooth and as risk-free transit through life. Because of this the best and the brightest can suddenly become very credulous and succumb to deep-seated fear and prejudice. This propensity feeds the behaviour of some because they are told something, seize upon it and then happily transmit it to others as fact. The recipients believe it, often more so when it is passed to them by a person or source in whom additional credibility is invested.

I was struck yesterday when listening to an episode of The Infinite Monkey Cage – a science program on BBC Radio 4 – where anthropologists and evolutionary biologists were tearing their hair out at the traction an image we are all familiar with has gained. The evolution of man from ape to upright walking man is apparently a terribly inaccurate and misleading image. Apparently, it first appeared in a French school textbook back in the Fifties, resonated (which shows the power of a credible source and a good image) so much that it stuck and has been reproduced millions of times over. I had no idea how inaccurate it was and like to think that I am not very credulous. It goes to show the power of something that has been ever-present though. Few people except the experts challenge it, even now.

human-evolution-670
The iconic, contested and wholly inaccurate image

Bringing this to business: I feel for the person or team at Apple that had to brief Tim Cook and co that the earnings forecast had to be dramatically trimmed because the previous cash-cow of the iPhone was no longer selling as quickly. I appreciate I have the benefit of hindsight regarding the following remark; the fact that people were hanging onto their devices for longer and were railing against the so-called planned obsolescence that many believed was being built in.,coupled to the belief that the latest OS was designed to overwhelm older devices and yet without the latest OS then the functionality was going to limited henceforth, really upsets consumers. If that is combined with the increase in length of the service contracts we are all but forced to agree to by the network providers (here in the UK at any rate) in order to have the latest tech, subsidised by these growing contracts, I suspect this wouldn’t be such new news.

We can see the clever PR operation swing into action. Apparently great PR relies so heavily on gut feelings and relationships that people overlook how incredible people are at computing very complex Big Data. Still far ahead of any computer. To whit: the entire slowdown has been pinned almost completely on the Chinese market. Something I find hard to swallow. I have no doubt it is a large component and very politically expedient given the way China is portrayed in the US these days. The messaging seems to play heavily on the deterioration of relations between the US and China. The PR teams are operating on very thick and contextual data, nothing more. The human brains are the computers here. Either way, is apparently, not the fault of Apple… *coughs politely*

blaming everyone else

On the other hand, perhaps they knew of this trend and the feelings that underpinned it because they had excellent Big Data, had combined it with the Thick Data approach and insights of Anthropologists, Sociologists and Political Scientists who specialise in these fields, so they could synthesise the findings into usable data, and the real issue wasn’t knowing this but when to let the markets know? Sadly, few large companies manage to meld their data very effectively and usually the larger they are the greater the disconnect between the boardroom and the customer, and the inadequacies of the information providers aren’t spotted soon enough.

What about the person responsible, or is there one? Challenging assumptions is often uncomfortable and often seen in an organisation as disruptive and potentially unwanted behaviour. A Chief Data Officer (CDO) ought to have both the support and power to ask the ‘who, what, when, where and why’ questions relentlessly. In fact, if they aren’t querying the data they are to use for gaining insight and helping the other leaders to make the best informed decisions, they are probably falling short in their role.

Data – The Fog of Promises, and What To Do About It

Calculating the value of data is something I have been thinking about a lot. Data, any and all, seems to be relentlessly hoovered up whenever we use any form of connected device. Who had ever heard of data as a mainstream topic twenty years ago? Nowadays, we have seen Mark Zuckerberg answering to Congress in the States and countless articles based around what Google and Apple know about us. Some people are laissez-faire about it whilst others veer towards the downright paranoid.

zuck in congress

Organisations collect data, they hoard data and (hopefully) guard these vast amounts of data that they collect. Why? Because it is valuable. It is useful. Apparently. However, who in a company actually gets down to the nitty-gritty of this and can measure and express the Return on Data that this feverish collection and hoarding actually brings to the organisation?

In 2015 Doug Laney from Gartner wrote about data in financial terms. How it can affect the value of a takeover target, if they have a vast unexploited data store, for example. Were that to be monetised then what is it worth? Does this mean the buyer is getting a fantastic deal or when it seems to be overvalued on traditional metrics is the difference made up by the value of their data? Herein lies a real problem as the difficulty in valuing data stems from several reasons.

Firstly, that there is no firm formula to do so, because to some that data is just wasted storage and to others it is gold. Whereas a physical asset, such as a piece of land, is such a mainstream asset that it is far easier to value. With data, the  great big lump of bits and bytes only has value if the owner knows how to extract information and insight from it, and use that effectively to make them more competitive or to sell to someone else in a finished and usable form. People have had a stab at it by trying to vary old maths and make it new maths. I found the following on the Internet:

ROD_Definition

Though his looks like an elegant formula,  the Gain from Data metric is subject to so many other variables, primarily time,  it is almost impossible to calculate so simply, makinf the formula impossible to scale. It only serves to highlight just how the temporal aspect of data value is so important. Depending on what it is, it may be very time limited, making it useful only in a very brief window. Think of data like a paper currency that can burst into flames at any moment.burning-money-png-2

One second it has the face value and the next it is ashes.

In contrast, a piece of land is just there. No more land is being created, whereas data creation is never-ending: limited only by our ability to get it and store it.

Secondly, the technical aspects are crucial. What form is it held in, on what type of database, where is it held (there are massive regulatory differences around the world), have the data owners consented to its use, by whom, how old is it, how consistent is it and so on. If I can’t use in my company for my purposes then it is just Ones and Zeros on a hard drive somewhere, merely cluttering up the ether. Utterly without value.

The fact remains that extraordinary amounts of data are being recorded about us, all of the time. I recently holidayed in Norway and in ten days I didn’t use one bit of hard currency. All card, all the time. I navigated around using Google Maps. I checked TripAdvisor and used Uber, as well as uploading countless photos to Facebook for family abroad to see. In doing so I must have left an enormous digital smear across the Norwegian landscape. Me and the thousands of other tourists on holiday at the same time. Can you imagine the quantity of data generated by me and the billions of other people using connected services every single day?

To be able to achieve a RoD that makes all the efforts and costs at collection and storage worthwhile, several things need to happen and I can only really see that these can happen under the guidance and direction of a very senior – if not on the Board type senior – individual who guides a team with specific responsibilities. Call them a Chief Data Officer (CDO).

Ideally the value of data is considered so important that the CDO is on the Board. The CDO would need to have close ties with Marketing and Strategy functions to understand how they intend to use resources to achieve them, and whether existing data is useful or new data needs to be acquired. Additionally,  they need to know how to shape and deliver it to them in a worthwhile manner. Then there needs to be a real-time feedback loop – Sales? – in order to assess the efficacy of the deployed data as well, as a direct line between them and the technical functions of the company. The sort of things CIO deals with, especially storage and access. The CFO will have demands on their funds from the CDO. They need to be able to understand the RoD and how it is affecting the bottom line, the share price, their partners and so on.

businessman staring into fog

Most importantly of all is someone who can see through the Fog of Promise that all this data is purported to hold. The RoD that can be achieved if only they used it ‘properly’ is the sort of golden thread that is so often sold to them. Correlation does not equal causation. I’ll repeat that: correlation DOES NOT equal causation. Falling into the Feynman Trap is something that affects the best and the brightest (Famously, Jim Collins did this in Good to Great). Usually when they become mesmerised by their own belief in the infallibility of data.

The CDO not only ensures the data is valued correctly, they are responsible for preventing their company being led down a rabbit-hole of promise of the jam-tomorrow variety. The sunken cost fallacy remains as relevant today as it ever was and sometimes the emperor is indeed naked.