Omitted Variables in the regression. Refer to the document in that case.

Variance of Parameter Estimators

In this multivariate model the variance of is:

where

  • is the variance of the regression
  • is the coefficient of determination in the auxiliary regression . It measures multicolinearity
    • & It measures how much of can be explained by
    • In perfect multicolinearity then will perfectly determine , thus is not relevant anymore.
    • & Equivalently is the coefficient of determination from , etc. etc.
  • is known as the variance inflation factor. The higher the multicollinearity, the more inflated is the variance of parameter estimators.
  • Multicollinearity is not a problem iff:
    • it occurs only between control variables
    • is small enough; i.e. is statistically significant.
  • is the sample size

Coefficient of Determination

In multivariate situations (as opposed to bivariate) the coefficient of determination of the whole regression () is more troublesome because

  1. Adding more variables will only increase
  2. Thus one is incentivized to increase the number of (possibly irrelevant) independent variables.
  3. To mitigate this, we report the adjusted values instead, with a penalty for each additional independent variables used.

Hypothesis Testing regarding Coefficients

Standardizing Coeffiefficients

Example. In the model , if we want to compare the effects of life expectancy and literacy, we cannot simply compare the values of . This is because their units are different, i.e. is in and literacy is in . Thus we need to standardize them:

which shows “how much increase in units of does one unit increase in cause?”

Remark. Only and need be standardized to compare and ’s effects. need not be standardized.

Hypothesis Testing about Coefficients

Let the model . Sometimes we may want to check if , or . In these cases we use a hypothesis test. Let be the of this regression. Now, before we do anything we need to…

X_{1},X_{2}.

Case 1: . Then the model under null would change to:

We run regression on this new model and get Remark. This is not equivalent to running a t-test on because may be multicollinear.

Case 2: . Then the model under null would change to:

We run regression on this to also get .

We can observe that in both cases, , always, because “restricting” the model will lead only to less (coefficient of-) determination. Now, the bigger this difference is, the more likely that the null is false. We formalize this using the -test:

def. F-Test. For the F-statistic defined as:

where

  • is “how many equal signs in null hypothesis”
  • is the degrees of freedom (=number of coefficients in the _un_restricted model) Then:

where is the critical value. The critical values are:

  • in case 1 ()
  • in case 2 ()