You are here

Sampling

22 October, 2015 - 12:51

It is important to recognize that there is another cost to using statistics, even after you have learned statistics. As we said before, you are never sure that your inferences are correct. The more precise you want your inference to be, either the larger the sample you will have to collect (and the more time and money you'll have to spend on collecting it), or the greater the chance you must take that you'll make a mistake. Basically, if your sample is a good representation of the whole population—if it contains members from across the range of the population in proportions similar to that in the population—the inferences made will be good. If you manage to pick a sample that is not a good representation of the population, your inferences are likely to be wrong. By choosing samples carefully, you can increase the chance of a sample which is representative of the population, and increase the chance of an accurate inference.

The intuition behind this is easy. Imagine that you want to infer the mean of a population. The way to do this is to choose a sample, find the mean of that sample, and use that sample mean as your inference of the population mean. If your sample happened to include all, or almost all, observations with values that are at the high end of those in the population, your sample mean will overestimate the population mean. If your sample includes roughly equal numbers of observations with "high" and "low" and "middle" values, the mean of the sample will be close to the population mean, and the sample mean will provide a good inference of the population mean. If your sample includes mostly observations from the middle of the population, you will also get a good inference. Note that the sample mean will seldom be exactly equal to the population mean, however, because most samples will have a rough balance between high and low and middle values, the sample mean will usually be close to the true population mean. The key to good sampling is to avoid choosing the members of your sample in a manner that tends to choose too many "high" or too many "low" observations.

There are three basic ways to accomplish this goal. You can choose your sample randomly, you can choose a stratified sample, or you can choose a cluster sample. While there is no way to insure that a single sample will be representative, following the discipline of random, stratified, or cluster sampling greatly reduces the probability of choosing an unrepresentative sample.