Why Doesn’t Big Data Always = Good Data?

The Data Scientists out there will sigh as they feel that they have heard this a thousand times before. However, it is human beings that are the issue. Numbers are just numbers, it is what we humans do with them that is the issue.

Very quickly then; this is the correlation and causation argument writ large.

correlation causation.jpg
But it must be true???

Can you see the issue? On the face of it it makes sense. I prefer the elegance of expression of the original description of post hoc ergo propter hoc. Merely acquiring more and more data points, a bigger data set, better hardware, software and human expertise to manipulate this data does not equal better results from the data.

Big data is great and powerful when it is clean and accurate data. But….pause and think: before plunging into the analysis and insight phase the cleaning and tidying phase – the often skipped past boring stuff – needs to be complete. The crazy outliers need to be identified, partial data from a one source needs to be investigated, in the case of human surveys the ‘don’t know’ answers may be coded out, and so on.

There are a variety of ways to allow the Data Scientists to do this, but the heart of the matter is that if they are not given the time, tools and budget to do this then you are back to the junk in, junk out scenario that affects everything to do with computers.

As humans we are programmed in ways that really hamper us. This is especially true when we are operating outside of our field of expertise or are very out of date regarding a subject matter area. Our brains crave clarity and simplicity, we avoid the unknown as that is where danger may lie. We want to make as smooth and as risk-free transit through life. Because of this the best and the brightest can suddenly become very credulous and succumb to deep-seated fear and prejudice. This propensity feeds the behaviour of some because they are told something, seize upon it and then happily transmit it to others as fact. The recipients believe it, often more so when it is passed to them by a person or source in whom additional credibility is invested.

I was struck yesterday when listening to an episode of The Infinite Monkey Cage – a science program on BBC Radio 4 – where anthropologists and evolutionary biologists were tearing their hair out at the traction an image we are all familiar with has gained. The evolution of man from ape to upright walking man is apparently a terribly inaccurate and misleading image. Apparently, it first appeared in a French school textbook back in the Fifties, resonated (which shows the power of a credible source and a good image) so much that it stuck and has been reproduced millions of times over. I had no idea how inaccurate it was and like to think that I am not very credulous. It goes to show the power of something that has been ever-present though. Few people except the experts challenge it, even now.

human-evolution-670
The iconic, contested and wholly inaccurate image

Bringing this to business: I feel for the person or team at Apple that had to brief Tim Cook and co that the earnings forecast had to be dramatically trimmed because the previous cash-cow of the iPhone was no longer selling as quickly. I appreciate I have the benefit of hindsight regarding the following remark; the fact that people were hanging onto their devices for longer and were railing against the so-called planned obsolescence that many believed was being built in.,coupled to the belief that the latest OS was designed to overwhelm older devices and yet without the latest OS then the functionality was going to limited henceforth, really upsets consumers. If that is combined with the increase in length of the service contracts we are all but forced to agree to by the network providers (here in the UK at any rate) in order to have the latest tech, subsidised by these growing contracts, I suspect this wouldn’t be such new news.

We can see the clever PR operation swing into action. Apparently great PR relies so heavily on gut feelings and relationships that people overlook how incredible people are at computing very complex Big Data. Still far ahead of any computer. To whit: the entire slowdown has been pinned almost completely on the Chinese market. Something I find hard to swallow. I have no doubt it is a large component and very politically expedient given the way China is portrayed in the US these days. The messaging seems to play heavily on the deterioration of relations between the US and China. The PR teams are operating on very thick and contextual data, nothing more. The human brains are the computers here. Either way, is apparently, not the fault of Apple… *coughs politely*

blaming everyone else

On the other hand, perhaps they knew of this trend and the feelings that underpinned it because they had excellent Big Data, had combined it with the Thick Data approach and insights of Anthropologists, Sociologists and Political Scientists who specialise in these fields, so they could synthesise the findings into usable data, and the real issue wasn’t knowing this but when to let the markets know? Sadly, few large companies manage to meld their data very effectively and usually the larger they are the greater the disconnect between the boardroom and the customer, and the inadequacies of the information providers aren’t spotted soon enough.

What about the person responsible, or is there one? Challenging assumptions is often uncomfortable and often seen in an organisation as disruptive and potentially unwanted behaviour. A Chief Data Officer (CDO) ought to have both the support and power to ask the ‘who, what, when, where and why’ questions relentlessly. In fact, if they aren’t querying the data they are to use for gaining insight and helping the other leaders to make the best informed decisions, they are probably falling short in their role.

Advertisements

Data – The Fog of Promises, and What To Do About It

Calculating the value of data is something I have been thinking about a lot. Data, any and all, seems to be relentlessly hoovered up whenever we use any form of connected device. Who had ever heard of data as a mainstream topic twenty years ago? Nowadays, we have seen Mark Zuckerberg answering to Congress in the States and countless articles based around what Google and Apple know about us. Some people are laissez-faire about it whilst others veer towards the downright paranoid.

zuck in congress

Organisations collect data, they hoard data and (hopefully) guard these vast amounts of data that they collect. Why? Because it is valuable. It is useful. Apparently. However, who in a company actually gets down to the nitty-gritty of this and can measure and express the Return on Data that this feverish collection and hoarding actually brings to the organisation?

In 2015 Doug Laney from Gartner wrote about data in financial terms. How it can affect the value of a takeover target, if they have a vast unexploited data store, for example. Were that to be monetised then what is it worth? Does this mean the buyer is getting a fantastic deal or when it seems to be overvalued on traditional metrics is the difference made up by the value of their data? Herein lies a real problem as the difficulty in valuing data stems from several reasons.

Firstly, that there is no firm formula to do so, because to some that data is just wasted storage and to others it is gold. Whereas a physical asset, such as a piece of land, is such a mainstream asset that it is far easier to value. With data, the  great big lump of bits and bytes only has value if the owner knows how to extract information and insight from it, and use that effectively to make them more competitive or to sell to someone else in a finished and usable form. People have had a stab at it by trying to vary old maths and make it new maths. I found the following on the Internet:

ROD_Definition

Though his looks like an elegant formula,  the Gain from Data metric is subject to so many other variables, primarily time,  it is almost impossible to calculate so simply, makinf the formula impossible to scale. It only serves to highlight just how the temporal aspect of data value is so important. Depending on what it is, it may be very time limited, making it useful only in a very brief window. Think of data like a paper currency that can burst into flames at any moment.burning-money-png-2

One second it has the face value and the next it is ashes.

In contrast, a piece of land is just there. No more land is being created, whereas data creation is never-ending: limited only by our ability to get it and store it.

Secondly, the technical aspects are crucial. What form is it held in, on what type of database, where is it held (there are massive regulatory differences around the world), have the data owners consented to its use, by whom, how old is it, how consistent is it and so on. If I can’t use in my company for my purposes then it is just Ones and Zeros on a hard drive somewhere, merely cluttering up the ether. Utterly without value.

The fact remains that extraordinary amounts of data are being recorded about us, all of the time. I recently holidayed in Norway and in ten days I didn’t use one bit of hard currency. All card, all the time. I navigated around using Google Maps. I checked TripAdvisor and used Uber, as well as uploading countless photos to Facebook for family abroad to see. In doing so I must have left an enormous digital smear across the Norwegian landscape. Me and the thousands of other tourists on holiday at the same time. Can you imagine the quantity of data generated by me and the billions of other people using connected services every single day?

To be able to achieve a RoD that makes all the efforts and costs at collection and storage worthwhile, several things need to happen and I can only really see that these can happen under the guidance and direction of a very senior – if not on the Board type senior – individual who guides a team with specific responsibilities. Call them a Chief Data Officer (CDO).

Ideally the value of data is considered so important that the CDO is on the Board. The CDO would need to have close ties with Marketing and Strategy functions to understand how they intend to use resources to achieve them, and whether existing data is useful or new data needs to be acquired. Additionally,  they need to know how to shape and deliver it to them in a worthwhile manner. Then there needs to be a real-time feedback loop – Sales? – in order to assess the efficacy of the deployed data as well, as a direct line between them and the technical functions of the company. The sort of things CIO deals with, especially storage and access. The CFO will have demands on their funds from the CDO. They need to be able to understand the RoD and how it is affecting the bottom line, the share price, their partners and so on.

businessman staring into fog

Most importantly of all is someone who can see through the Fog of Promise that all this data is purported to hold. The RoD that can be achieved if only they used it ‘properly’ is the sort of golden thread that is so often sold to them. Correlation does not equal causation. I’ll repeat that: correlation DOES NOT equal causation. Falling into the Feynman Trap is something that affects the best and the brightest (Famously, Jim Collins did this in Good to Great). Usually when they become mesmerised by their own belief in the infallibility of data.

The CDO not only ensures the data is valued correctly, they are responsible for preventing their company being led down a rabbit-hole of promise of the jam-tomorrow variety. The sunken cost fallacy remains as relevant today as it ever was and sometimes the emperor is indeed naked.