Endogeneity is one assumption that we make while performing regression using the OLS method. But what is endogeneity? Let’s go back to regression. In mathematical language:
observation = deterministic model + residual error
Y = (a + b X) + e
The term residual error represents all the things unknown to the observer but may have contributed towards the observation. But for linear regression, there is a condition, Gauss–Markov condition, that requires the error term to be uncorrelated to the independent variable. If this is not true, it is a case of endogeneity.
The first reason for endogeneity is called an omitted variable. An example is a cause that results in the variation of X or Y (or both).
The second cause is simultaneity or bidirectional; X causes Y, and Y causes X. It is also called reciprocal causation.
The third cause is selection bias, which means the sampling itself is not randomised.