Generalised Linear Models

In an ordinary linear model, like linear regression exercises, we express the dependant variation (y) as a function of the independent variable (x) as:

y_i = \beta_0 + \beta_1 x_i + \epsilon_i

The equation is divided into two parts. The first part is the equation of a line.

\mu_i = \beta_0 + \beta_1 x_i

where beta0 is the intercept and beta1 is the slope of the line. It just described the line (the approximation), but you need to add the error term to include the points around the line. The second part is the error term or the points around the idealised line.

\epsilon_i = N(\mu_i, sd)

The points around the line are normally distributed in linear regression, so the epsilon term is normal.

LM to GLM

Imagine if the dependent variable is binary—it takes one or zero. The random component (error term) is no longer normally distributed in such cases. That is where the concept of generalised linear models (GLM) comes in. Here, the first part remains the same, while the second part can take other types of distributions as well. In the case of binary, as in logistic regression, GLM is used with binomial distribution through a link function.

\eta = \beta_0 + \beta_1 x

link function

\eta = logit(\mu)

random component is a binomial error distribution family.

\epsilon_i = Binomial(\mu)

In Poisson regression, the error term takes a Poisson distribution.

\eta = \beta_0 + \beta_1 x

\eta = log(\mu)

\epsilon_i = Poisson(\mu)

In R, you use glm() with an attribute on the family as, glm(formula, family = “binomial”)

binomial()
Family: binomial 
Link function: logit 
poisson()
Family: poisson 
Link function: log