Sampling

It is relatively easy to observe people’s behaviors or ask them what they think about something. However, it is another thing completely to claim that these observations (e.g. ticks on a survey form) apply to other people than those who we have asked as well. The problem of how general our results are can be solved by sampling methods. The size and representativeness of a sample define the limits for inferences that are made on the basis of the sample data. In general, researchers collect from a few hundred to thousands or even tens of thousands of observations. But even with a sample as small as 30 cases some conclusions can be drawn.

One of the most critical issues in data analysis is to understand what the difference between the sample and the population is. Basically, a sample is subset of a population. It is not just any subset, however, as a researcher usually has some kind of idea of what kind of subset it is, i.e. how the sample has been selected. In the ideal case, 1) each member of the population has the same probability of being selected for the sample and 2) they are independent with regard to the measurable characteristic. Then we refer to a random sample. This is an ideal case and we can define the exact limits within which the results drawn from a random sample can represent the whole population.

The fact that a random subset of a larger population can be representative is based on the central limit theorem (CLT), a finding belonging to the area of probability theory. The central idea of CLT is that given the sample size and the variation in population, a sufficiently large number of samples (or their mean values) summed up will yield a bell curve -shape normal distribution. As we know what the properties of this distribution are, we can assume, under certain conditions, that the mean of the sample we have, is near to most of the other samples’ means, meaning that we don’t need to take hundreds of samples but only one, randomly selected. This finding forms a basis for statistical estimation and testing, which are covered in the analysis section.

In reality, a random sample is not easily achievable, and there are other sampling strategies too. For example, in a convenience sample the members of the population are selected on the basis of their availability. Results drawn from studies based on this type of sample are not generalizable to any population. If the study is conducted for a sufficiently small population, such as a firm with 500 workers, sampling need not be used. All the workers can be sent an email inquiry to participate in the study. Researchers need to evaluate then how well those who participated represent the whole firm (for example, are all the departments included) and whether there are any possible sources for systematic bias (such as people without computers, who only rarely read their emails).