In an ordinary linear model, like linear regression exercises, we express the dependant variation (y) as a function of the independent variable (x) as:
The equation is divided into two parts. The first part is the equation of a line.
where beta0 is the intercept and beta1 is the slope of the line. It just described the line (the approximation), but you need to add the error term to include the points around the line. The second part is the error term or the points around the idealised line.
The points around the line are normally distributed in linear regression, so the epsilon term is normal.
LM to GLM
Imagine if the dependent variable is binary—it takes one or zero. The random component (error term) is no longer normally distributed in such cases. That is where the concept of generalised linear models (GLM) comes in. Here, the first part remains the same, while the second part can take other types of distributions as well. In the case of binary, as in logistic regression, GLM is used with binomial distribution through a link function.
link function
random component is a binomial error distribution family.
In Poisson regression, the error term takes a Poisson distribution.
In R, you use glm() with an attribute on the family as, glm(formula, family = “binomial”)
binomial()
Family: binomial
Link function: logit
poisson()
Family: poisson
Link function: log