Post-treatment variables are variables that confound the “controlling for endogenous factors by including them in the regression” technique. They can cause problems is two ways:


Motivation. Suppose regressing Earnings vs Tutoring:

But secretly, there was a correlation where: Therefore, Now, simply controlling for reading scores like this:

Is not effective, because there’s a multicollinearity between and . In this case, , and this doesn’t capture the portion.


Collider Bias

Collider bias

A stupid model:

will have biases:

Example. Suppose we are regressing: Flu vs. Car accident. We also try to control for being being infected while being hospitalized:

Suppose, however, reality went like this:

  1. Car accidents lead to hospitalization, but don’t cause the flu
  2. Fevers lead to both hospitalization and cause flu testing Then:
  • i.e. accident and flu is