As we have seen earlier, a statistician estimates the population parameters from the sample parameters. And sampling is the all-important process of selecting subjects or groups that provide the (representative) data required for the work.
Sampling can be of two types – probabilistic and non-probabilistic.
In probabilistic sampling, individual samples are selected based on a known probability distribution. In other words, each element in the group has a known and non-zero probability of being selected. This minimises the risk of systematic bias, i.e., the production of over- or under-representation of sub-groups while picking participants. There are four major types of probabilistic sampling.
Random Sampling
In simple random sampling, each element in the sampling frame has an equal and independent probability of being included. It works well when the population is homogenous. Random sampling is usually done without replacement, although the other possibility – with replacement – is also valid. An easy method is to write down all cases in the population and draw uniform random numbers to select.
Stratified Sampling
In stratified random sampling, the sample is divided into multiple mutually exclusive strata. Sampling then starts from each stratum separately, using random sampling. The separately sampled elements are added together to form the final sample. This technique is critical in less homogenous populations, such that the sample is representative of the strata.
Cluster sampling
In multistage cluster sampling, samples are randomly selected in stages. The steps are:
1) the population is divided into mutually exclusive clusters.
2) use random sampling to select clusters
3) second-level random sampling is done inside the selected clusters to select samples.
Bootstrap aggregating
In bootstrap aggregating or bagging, several samples are generated (or bagged) randomly from the population with replacement. Different analytical methods are developed for each sample.