Regression Discontinuity analysis is useful when

  1. The assignment variable (independent variable) is continuous (not binary)
    1. “Treatment” thus simply means after-cutoff
    2. We can’t just use Difference of Means because the assignment variable continuously changes in each groups too and thus affects the dependent variable too.
  2. No other factors change when crossing the cutoff Then we can use the basic (constant-slopes) RD model:

where:

  • is the cutoff point

  • is for post-cutoff, for pre-cutoff. Simulates “treatment”

  • is the assignment variable (non-adjusted). where coefficients mean:

  • is the causality of treatment

  • is the causality of the assignment var

  • Error term shouldn’t be jumping at the cutoff, i.e. we should have

    • If then thus the basic RD model from above becomes:
- → Thus even if $\rho_{X_{1},\epsilon}\neq 0$, $\hat{\beta_{1}}$ is unbiased (i.e. still correctly indicates the causality between treatment and $Y$)!
- But $\hat{\beta_{2}}$ no longer indicates causality between $X_{1}$ and $Y_{i}$, instead just indicates. overall correlation.

Advanced RD

We don’t need to limit ourselves to have the slopes be same before and after discontinuity by using the varying slopes model:

Smaller windows → probably linear

Issues with RD Analysis

  • Smaller window (=bandwidth): We must look at variables close to the cutoff (because the farther away you go, the more endogenity there might be) But this isn’t always possible because of limited sample size.
  • Probably only estimates the Local Average Treatment Effect (LATE), meaning that you can’t generalize the results. (“Are effects of drinking (vial legal age) on grades affect babies? old econometrics professors?“)
  • Multiple variables usually determine treatment or not. (medicare and age is clear-cut; SAT and college admission isn’t) → use Fuzzy RD model
    • Or, we can use the Balace Test to see if the side of the cutoff is truly random
  • Error term jumps at discontinuity (the issue from above)
    • To check, make sure the frequency of the assignment variable is smooth at cutoff (e.g. the number of people (=samples) with SAT scores just below 1500 and just above 1500 isn’t too different from the rest of the grade.)
      • If there is, it might mean that people under 1500 wanted to get into colleges with cutoff score at 1500, so studied a little bit harder.
    • Visualization.
    • Another way to check this is to run regression between covariate (which we suspect to be in the error term) and the RD model:
      • A Statistically significant indicates is in the error term