Data and variables

Data refers to a set of values, which are usually organized by variables (what is being measured) and observational units (members of the sample/population). An example of data is a data matrix in a spreadsheet program, such as Excel or SPSS. Along the upper horizontal line there are the variables (e.g. survey questions) and down the first vertical line there are the observations (e.g. people). In each cell there is a value that is the given observational unit’s value of a given variable.

An example of a data matrix in SPSS program. The figure shows a data matrix of World Economic Freedom data. The countries on each row are observational units or "cases", and the columns of different freedom indexes, such as business freedom, are variables. Each country then has its individual cell scores for the variables.
An example of a data matrix in SPSS program. The figure shows a data matrix of World Economic Freedom data. The countries on each row are observational units or “cases”, and the columns of different freedom indexes, such as business freedom, are variables. Each country then has its individual cell scores for the variables.

Variables are of different types and can be classified in many ways, for example as numerical and categorical variables. Numerical variables are measured by some (usually existing) measures, whereas categorical variables are qualitative, and not necessarily more or less, or bigger and smaller than one another. Another way of classifying variables is according to their measurement scale.

The continuous variable is numerical and it can take, in theory, an infinite amount of values. An example of such a variable is length in centimeters or inches. The discrete variable is also numerical, but differs from a continuous variable in that it takes a finite number of values. An example of a discrete variable is a performance score from 1 to 10. Each value is, in theory, equally far from the subsequent value, so that 4 is exactly the same increase from 3 as 9 is from 8.

If a numerical variable has an absolute zero, the variable can be measured on a ratio scale. A typical example is weight. It can be zero, but definitely not less than that. Therefore we can say that one observation is twice as heavy as the other one. If the variable has no zero point, it has an interval scale, meaning that the distances between different values are same (e.g. from 10 to 20 and from 40 to 50), but the zero point is arbitrary. An example of an interval scale is temperature. A centigrade thermometer can show “0”, but the quality itself, temperature, does not cancel out. The zero point is arbitrary. Therefore we cannot really say that it is “twice as cold” as yesterday.

Categorical variables, in turn, can be nominal, in which case there is no order at all: each category has its unique meaning (“What domestic pets do you like most: 1 = cats, 2 = dogs, 3 = hamsters, 4 = bunnies?”). If there is a sense of order there, the variables are called ordinal. A Likert-type scale represents an ordinal measurement: 1 = “not like me at all”, 2 = “not like me”, 3 = “not sure”, 4 = “somewhat like me”, 5 = “very much like me”. A special type of variable is the dichotomous one. It can have only two values (e.g. gender) and it can be interpreted both as numerical and categorical.