Population and Sample

We have used these terms many times in the past. This time, we look at their formal definitions.

Population describes the set of all possible observations. For example, the population relevant for the US presidential election represents all eligible voters in the states, which is about 240 million. If one wants to determine the true average height of adult women in the US, one needs to collect data on ca. 108 million females 18 years and older. Similarly, to obtain the real fault rate of a product, the factory manager needs to inspect all products it manufactures!

Collecting data from every single individual (product) is not practically possible. What is possible is to inspect a fraction of the population. This subset is called a sample.

The characteristic of a population is known as a parameter. E.g., the mean height of adult women in the US is a parameter with an exact value. You get it if someone cares to measure every individual of that population. The two most popular parameters are mean (mu) and standard deviation (sigma)

\text{population mean } = \mu; \text{ population standard deviation } = \sigma

A statistic is a characteristic of a sample. The sample mean and the sample standard deviation are the corresponding terms for samples.

\text{sample mean } = \bar{X}; \text{ sample standard deviation } = s

Inference statistics is a means to estimate (population) parameters from (sample) statistics. While it is possible to get a representative sample as a proxy for the population, they are never equal. The differences between sample statistics and population parameters are called sampling error.