Variance and Bias

Mean

We know what the mean for the distribution of random variables is; it is called the expected value. You can calculate it by multiplying outcomes by their respective probabilities and sum over all those outcomes.

E[X] = \sum\limits_{i=1}^n (p(X_i)*X_i) = \mu

If the variables are discrete and all the probabilities are equal (equal weightage), add them up and divide by the number of variables.

\mu = \frac{1}{n}\sum\limits_{i=1}^n (X_i)

Variance

The spread of values around the mean is variance. It is calculated by calculating the mean of (expected value of) square of sample values after subtracting the mean.

E[(X - \mu)^2] = \sum\limits_{i=1}^n [p(X_i)*(X_i - \mu)^2]

Like in the earlier case, for an equally likely case (all p(Xi) are equal), variance can be calculated by adding all the squares and dividing by the total number.

Var(X) = E[(X - \mu)^2] = \frac{1}{n}\sum\limits_{i=1}^n [(X_i - \mu)^2]

Bias

So far, we have assumed unbiased samples (e.g. fair coin, fair dice etc.) in our discussions. In such cases, the true values equalled the expected values. It is not the case in real life, where biases do occur. Bias is the difference between the average value and true value. In other words, you get the true value after subtracting bias from the average of the estimators.

Bias(\hat\theta) = E[\hat\theta] - \theta

Variances and biases are two causes of error in data. How they are related or not related is the topic of another discussion.