Generalised Linear Models

In an ordinary linear model, like linear regression exercises, we express the dependant variation (y) as a function of the independent variable (x) as:

$y_i = \beta_0 + \beta_1 x_i + \epsilon_i$

The equation is divided into two parts. The first part is the equation of a line.

$\mu_i = \beta_0 + \beta_1 x_i$

where beta₀ is the intercept and beta₁ is the slope of the line. It just described the line (the approximation), but you need to add the error term to include the points around the line. The second part is the error term or the points around the idealised line.

$\epsilon_i = N(\mu_i, sd)$

The points around the line are normally distributed in linear regression, so the epsilon term is normal.

LM to GLM

Imagine if the dependent variable is binary—it takes one or zero. The random component (error term) is no longer normally distributed in such cases. That is where the concept of generalised linear models (GLM) comes in. Here, the first part remains the same, while the second part can take other types of distributions as well. In the case of binary, as in logistic regression, GLM is used with binomial distribution through a link function.

$\eta = \beta_0 + \beta_1 x$