Two-Stage Lease Squares Regression

Motivation. Instrumental variables are an advanced way of removing endogenity. Suppose you want to see the effects of police on crime rates. But does crime cause more police to be hired, or does more police cause less crime? Because of this circularity, it’s hard to isolate simply the causality from police to crime rates. So to isolated the exogenous variation in police hiring (i.e. get the pure exogenity), we can use a proxy or instrumental variable: firefighters. Exogenous factors both cause firefighters and police to be hired (policy changes, citizen support) but they don’t cause crime rates to increase.

2-Stage Lease Squares

Instrumental variables are implemented using 2SLS.

We have the first stage (=reduced form), where we regress the independent variable (police) with instrumental variable (firefighters):

police \hat{X_{1 i}} = γ_{0} + γ_{1} firefighters Z_{i} + γ_{2} some control X_{2 i} + ν_{i}

Then we take the ==police estimator $\hat{X}_{i t}$ (not the actual data!)== and then regress dependent (crime) against estimated independent (estimated police):

Y_{i} = β_{0} + β_{1} estimated \hat{X}_{1 i} + β_{2} control X_{2 i} + ϵ_{i}

Visualization.

What Are Good Instrumental Variables?

Inclusion condition: $Z$ needs to have meaningful influence on $X$
1. Satisfied when in the first stage $γ_{1}$ is significant at 1%.
Exclusion condition: $Z$ needs to have no infludence on $Y$
- in other words, $Z$ and $ϵ$ should not be correlated.
- But this cannot be tested directly because $ϵ$ is not observed.
  - → thus the exclusion condition is more of a “it’s probably okay” argument
1. ! running regression $Y$ against $Z$ might seem prudent but actually if $Z$ is correlated with $\hat{X}$ and $\hat{X}$ with $Y$ then it will always be significant
2. The best we can do is include more controls we assume are inside $ϵ$ that correlate with $Z$ , such that the new $ϵ$ with more controls are not correlated to $Z$ .

Multiple Instruments

Perform the first stage with each instruments $Z_{1}, Z_{2}, \dots$ :

X_{1 i} = γ_{0} + γ_{1} Z_{1 i} + γ_{2} Z_{2 i} + \dots + ν_{i}

and then same for the second stage.

Over-identification test tests the exclusion restriction. if you’ve gotten the correct multiple instrumental variables. (Overidentification is a good thing.) If we want to see if $Z_{1}$ and $Z_{2}$ together satisfy the exclusion condition we test:

First stage just with $Z_{1}$ to get $\hat{X_{1}}^{Z_{1}}$ . Then the second stage to get $\hat{β_{1}}^{Z_{1}}$
First stage just with $Z_{2}$ to get $\hat{X_{1}}^{Z_{2}}$ . Then the second stage to get $\hat{β_{1}}^{Z_{2}}$
If $\hat{β_{1}}^{Z_{1}} \approx \hat{β_{1}}^{Z_{2}}$ then good!
! but if they’re similar this might just mean that both are bad in similar ways…
And If they’re different, there’s no way of knowing which one is the better one Alternatively, run

\overset{ϵ_{i}}{^} = \overset{α_{0}}{^} + \overset{α_{1}}{^} Z_{1 i} + \overset{α_{2}}{^} Z_{2 i} + α_{3} X_{2 i}

and then F-test: $H_{0} : α_{1} = α_{2} = 0$ . A exclusion-condition compatible IV should not be jointly significant. But this is also inaccurate in the same way described above.

Comparison with Ordinary Least Squares

We should use 2SLS instead of OLS when we know that $X$ is very much endogenous, and we have found a good IV, $Z$ that satisfies the inclusion and exclusion conditions. To test if $X$ is endogenous enough for 2SLS to be useful, we use the following: def. Durbin-Wu-Hausman Test of $X$ ’s endogenity. Observe first the fact that assuming $Z$ is exogenous:

if $X$ is exogenous: $\hat{β}^{OLS} \approx \hat{β}^{2SLS}$
- i.e. $ρ_{X, ϵ} \approx 0$ already, so why use IV or 2SLS?
if $X$ is endogenous: $\hat{β}^{OLS} \neq = \hat{β}^{2SLS}$
- i.e. $ρ_{X, ϵ} \neq = 0$ so we must use IV or 2SLS! The test has null hypothesis $H_{0} : \hat{β}^{OLS} = \hat{β}^{2SLS}$ . If we reject the null, then we should use 2SLS.

Bias in 2SLS

def. Quazi-instruments are instruments where there exists some (small) $ρ_{Z, ϵ} \neq = 0$ (Usually okay, see below)
def. Weak Instruments are instruments where there exists some $ρ_{Z, X_{1}} \neq = 0$ . (Usually bad, see below) It’s sometimes okay to have some correlation between $Z$ and $ϵ$ , but having correlation between $Z$ and $X_{1}$ is pretty bad. To see why, observe the 2SLS bias of $\hat{β_{1}}$

n \to \infty lim \hat{β_{1}}^{2SLS} = β_{1} + bias \frac{ρ _{Z, ϵ}}{ρ _{Z, X_{1}}} \frac{σ _{ϵ}}{σ _{X_{1}}}

Compare this to vanilla OLS (for Consistency $ρ_{X, ϵ} = 0$ ):

n \to \infty lim \hat{β_{1}}^{OLS} = β_{1} + bias ρ_{X, ϵ} \frac{σ _{ϵ}}{σ _{X}}

This implies

When it has a strong first stage relationship (= $ρ_{Z, X_{1}}$ is small) the 2SLS’s $\hat{β_{1}}^{2SLS} i s b e er$
& If $\frac{ρ _{Z, ϵ}}{ρ _{Z, X_{1}}} < 1$ then 2SLS has less bias than vanilla OLS even if $ρ_{Z, ϵ} \neq = 0$
! If $\frac{ρ _{Z, ϵ}}{ρ _{Z, X_{1}}} > 1$ then 2SLS amplified any small correlation $ρ_{Z, ϵ}$ and easily becomes worse than vanilla OLS

Rule of Thumb for determining weak instruments

Use 2SLS with instrument $Z$ when in the first stage regression the test

{> H_{1} H_{0} if γ_{0} = γ_{1} = γ_{2} = \dots if else >

i.e. an F-test, $F > 10$ .

Even when $ρ_{Z, ϵ} = 0$ , since the above equations are for $lim_{n \to \infty}$ , when $n$ is small bias still exists (in the same direction as OLS).
- This bias will eventually go away in bigger $n$ (or, of $ρ_{Z, ϵ} \neq =$ , go the opposite way!)

Precision of 2SLS

Recall multivariate OLS variance of coefficients is:

Var (\hat{β_{1}}) = \frac{σ ^ ^{2}}{N \cdot Var ( X _{1} ) \cdot ( 1 - R _{1}^{2} )}

The 2SLS variance of coefficient is:

Var (\hat{β_{1}}) = \frac{σ ^ ^{2}}{N \cdot Var ( X _{1} ^ ) \cdot ( 1 - R _{\hat{X_{1}}^{N o Z}}^{2} )}

The differences are:

$\overset{σ}{^}^{2}$ (=second stage regression variance) may be larger because $ϵ$ has been purged
- This simplies that explainatory power in the observed $X_{1 i}$ that were correlated with $ϵ$ is purged (=thus total reduced)
$Var (\hat{X_{1}})$ , not $Var (X_{1})$ because we use estimates (see above) not actual data during regression
- $Var (\hat{X_{1}})$ is probably smaller because we purged $ϵ$ -related variance.
! $R_{\hat{X_{1}}^{N o Z}}^{2}$ is the R-squared from the new regression

\hat{X_{1, i}} = π_{0} + π_{1} X_{2 i} + η_{i}

- This regression determines how much does $X_{2}$, not $Z$, determines $\hat{X_{1}}$.
- $R^{2}$ in this regression thus measures the collinearity of $\hat{X_{1}}$ and $X_{2}$ (=controls)
    - btw, $R^{2}$ in the second regression doesn't mean shit.
- The lower the explanatory power of $Z$ on $\hat{X_{1}}$, the higher this value is, and the higher the variance of the 2SLS coefficient is.

PK's Notes

Explorer

Two-Stage Lease Squares Regression

2-Stage Lease Squares

What Are Good Instrumental Variables?

Multiple Instruments

Comparison with Ordinary Least Squares

Bias in 2SLS

Precision of 2SLS

Graph View

Table of Contents

Backlinks

PK's Notes

Explorer

Two-Stage Lease Squares Regression

2-Stage Lease Squares §

What Are Good Instrumental Variables? §

Multiple Instruments §

Comparison with Ordinary Least Squares §

Bias in 2SLS §

Precision of 2SLS §

Graph View

Table of Contents

Backlinks

2-Stage Lease Squares

What Are Good Instrumental Variables?

Multiple Instruments

Comparison with Ordinary Least Squares

Bias in 2SLS

Precision of 2SLS