Dummy Variables

Motivation. A treatment vs. Non-treatment group in medicine. A dummy variable in this case would be:

Treatment_{i} = {10 if treated group if control group

And regression:

Y_{i} = β_{0} + β_{1} Treatment_{i} + ϵ_{i}

Fitted value:
- $\hat{Y}_{i} = \hat{β}_{0}$ if $Treatment_{i} = 0$ ← “average of treatment group”
- $\hat{Y}_{i} = \hat{β}_{0} + \hat{β}_{1}$ if $Treatment_{i} = 1$ ← “average of control group”
- This is equivalent to performing a Difference of Means Test, which measures the difference of the mean of each group.
  - The difference of means is $\hat{β}_{1}$ (=the treatment effect)

Multiplicative Dummy Variables

Motivation. Gender pay gap is not an “additive” factor but a “multiplicative” factor, as it men are rewarded more for per unit experience than women. In this case we can model:

Salary_{i} = β_{0} + β_{1} Experience_{i} + Additive dummy β_{2} isMale + Multiplicative dummy β_{3} isMale_{i} \times Experience + ϵ_{i}

Regression Plot:

$\hat{β_{1}}$ is the slope of non-treament group (=women)
$\hat{β_{1}} + \hat{β_{3}}$ is the slope of the treatment group (=men)
Therefore Graph shape is:
1. $\hat{β_{1}}$ determines the slope (upwards/downwards sloping)
2. $\hat{β_{3}}$ will increase or decrease the magnitude of the slope in the treatment (men) group depending on its sign

! When using multiplicative dummy regression, in order to test for statistical significance of $isMale$ you must test additive and multiplicative term parameters together, i.e. using F-statistics to test $H_{0} : \hat{β_{2}} = \hat{β_{3}} = 0$
- The total effect of $isMale$ is $β_{2} + β_{3}$ , and the statistical significance is tested by $H_{0} : \hat{β_{2}} + \hat{β_{3}} = 0$
- In general, if just one of either $\hat{β_{2}}$ or $\hat{β_{3}}$ is individually significant the total effect is significant

Applications of Dummy Variables

Dummy Independent Variables in Multivariate OLS

Model:

Y_{i} = β_{0} + β_{1} Dummy_{i} + β_{2} X_{i} + ϵ_{i}

→ $β_{1}$ will tell us the difference between the groups $Dummy = 1$ and $Dummy = 0$

Example. How much does playing in home field create an advantage for the home team?

Goal Differential_{i} = β_{0} + β_{1} isHome_{i} + β_{2} Opponent Quality_{i} + ϵ

Adding $Opponent Quality$ will control for opponent quality
Regression Table
Regression Plot
At $Opponent Quality = 0$ , the difference in means in $\hat{β}_{1}$
$\hat{β}_{2}$ is irrelevant for this difference of means test. But if you want, it’s the slope between opponent quality and goal differential

Categorical to Dummy Variables.

Motivation. This is useful when you have categories like “1 indicates a person is from the Northeast, 2 from the Midwest, 3 from the South, and 4 from the West,” and you want a difference of means test for all of them combined. You can’t simply use the 1,2,3,4 as values because it “location” is not a quantity but a category. You use binary dummy variables instead:

Wages_{i} = β_{0} + β_{1} isNE + β_{2} isMW + β_{3} isSouth + ϵ

We don’t need a $isWest$ because $false$ on all three dummy variables indicates obviously that that datapoint is from the west. (i.e. avoid the “dummy perfect multicollinearity trap” where e.g. $isMale$ is perfectly negatively correlated with $isFemale$ ).
- i.e.g there cannot be any two categorical dummy variables which are both one.
Interpretation, e.g. for $β_{1}$ : the unit rise in wages by moving from reference (West) to NE.
Regression Table:
- Observe that all columns are symmetric. The model we used above is column (a), but since there are many other ways to define categories as dummy variables, we are showing that they all have the same result.
- & Each row represents “how better it is from the reference”; in (a), “how better is it from the west”.

We now have an example from the textbook that incorporates these. Example. Taxation vs Male suffrage, Year, War mobilization and location

Tax = β_{0} + β_{1} mSuffrage_{i} + β_{2} Year_{i} + β_{3} War_{i} + Location Dummies β_{4} isEurope_{i} + β_{5} isAsia + β_{6} isOceania_{i}

Regression Table:
- “Bivariate” column is the stupid model.
- (a) only adds the year. Year is significant endogenous factor
- (b) adds war. War is also a significant endogenous factor
- (c) includes location dummies, with North America as the reference (therefore model doesn’t include $isNA$ ).

PK's Notes

Explorer

Dummy Variables

Multiplicative Dummy Variables

Applications of Dummy Variables

Dummy Independent Variables in Multivariate OLS

Categorical to Dummy Variables.

Graph View

Table of Contents

Backlinks

PK's Notes

Explorer

Dummy Variables

Multiplicative Dummy Variables §

Applications of Dummy Variables §

Dummy Independent Variables in Multivariate OLS §

Categorical to Dummy Variables. §

Graph View

Table of Contents

Backlinks

Multiplicative Dummy Variables

Applications of Dummy Variables

Dummy Independent Variables in Multivariate OLS

Categorical to Dummy Variables.