A simple enough sounding question, though something that is quite contested. I propose that we need to look at three distinct subsets of the concept of data. You’ll see why in a moment why this article isn’t a technical explanation of data in stats. For that (and it is necessary) this is a super post that explains them.
This article is intended a guide to help you categorise the data that is being presented to you in the course of a day.
Type 1 – This is ‘just’ the hard numbers.
By this I mean just what you imagine. The figures that get plugged into SPSS, Stata, R, SAS and the like. How these are analysed determines the output. It is necessary – and can be mind-numbingly boring, I know this as I’ve had to do it many times! – to check how any of the variables may have been re-coded, re-weighted and then analysed in the data-management components (.do files, syntax files etc) of the popular stats packages. [Why isn’t Excel listed? I asked my ex-supervisor and a Professor who specialises in this stuff. He politely guffawed and told me that it isn’t a ‘proper’ statistical analysis program. Once the heavy lifting has been done it may be exported to Excel as that is what the majority of people are used to seeing.]
Type 2 – This type of data is the so-called softer numbers.
Whereas the first type of data is useful for analysing the patterns of turnout for an election, the way different materials on an aircraft fatigue, how people move through a supermarkets etc. Type 1 relies on quantifiable and easily measurable (converted into a numerical value for analysis) variables. One step right, turns right and two steps at a 40 degree angle, over a nine second period and so on.
Type 2 data is an attempt to record and analyse human emotions, behaviour, and sometimes capture the strength of intent to do or not do something. We have all been asked things like, “How did that make you feel? Please rate your reply from Very Unhappy, Unhappy, Neutral, Happy to Very Happy?” This is the classic Likert scale.
Stop though. Have you considered if Semantic Differential Scales were used instead? Perhaps a mixture of the two, or two different data sets derived using different assessment methodologies? These too can be plugged into the stats programs and analysed. The trickier thing here is the subjectivity element. Is my Very Unhappy the equivalent to your Very Unhappy. The way this effect is mitigated is by large-scale testing, as this generates a happy medium by excluding the outliers. Hence, be very wary when a small sample size is used to generate an indication of feeling or intent.
Type 3 – And this is where it gets hazy and interesting!
Type 3 data is the way in which data is framed and presented to you. This may be in a newspaper, an internal report or perhaps a sales presentation. They are all trying to sell you something. The wrapping of the data and analysis may be in a manner to enhance the credibility and believability of the package, or you may be being steered away from robust data because it doesn’t fit with someone’s agenda. Either way, you are being encouraged to buy in to a point of view and the ‘data’ is being used in an effort to burnish the idea.
Cleverly employed Visual Semiotics that speak to far deeper parts of our brain are often employed. You already know what these are, they’re the graphs, symbols and pie charts as well as the tangentially relevant accompanying images. See the recent post on the mangling of data by the Guardian newspaper – the image of the white police officer discharging a taser directly towards you – for an example of this. Creative affect labeling, which is the process of putting feelings into words, of some of the characteristics of the data, certainly the ones that focus is being directed towards, is influential. The latest research techniques have allowed scientists to show this happens, however you may think you can override such feelings.
Although Type 3 data is all about the way in which the data is framed, it isn’t the numbers in the traditional sense. It is the third part of the package. Type 1 data is, if correctly produced and analysed, completely susceptible to the influence of Type 3 data, as is Type 2 data.
Type 3 data is the processing, packing and presentation of the digital exhaust that makes up Types 1 and 2 data. It is important as it mediates between us unpredictable humans, slaves to our emotions, with all our psychological foibles and weaknesses hidden just below the surface. As such, Type 3 data should be afforded as much significance when analysing any data that is presented to us.