PPOL 502-07: Reg. Methods for Policy Analysis

class: center, middle, inverse, title-slide

# PPOL 502-07: Reg. Methods for Policy Analysis
## Week 03: Multiple Regression and Gauss Markov
### Alexander Podkul, PhD
### Spring 2022

---

## Today's Class Outline

* Problem Sets(!) and Other Notes
* Bias in Estimators
* Standard Error of the Regression
* Standard Error of the Coefficient
* Gauss-Markov Assumptions
* Multiple Linear Regression 
    * A Quick note on `$R^2$`
* Model Specification and Omitted Variable Bias
* Making Sense of Stata Output
* Working with Logged Variables
* Where We're Going Next
* Next Week's Readings
* __Break__
* Reviewing Relevant Stata Commands

---
## Problem Sets and Other Notes

### Problem Sets
Problem set #1 due __tonight__ (February 2)

Problem set #2 assigned __next week__ (February 9)

Problem set #3 assigned __the following week__ (February 16)

--
### Other Notes

* A few additional resources added to the "Resources" tab on the course website 
* In-Person instruction clarifications

---
## Bias in Estimators

Last week we covered the standard bivariate model: 
`$$Y_i = \beta_0 + \beta_1X_i + \epsilon_i$$`

--
From which we estimate the following: 
`$$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1}X_i$$`

--
where:

* `$\hat{Y_i}$` is the estimate for our fitted values 
* `$\hat{\beta_0}$` and `$\hat{\beta_1}$` are estimates for `$\beta_0$` and `$\beta_1$`, respectively
* `$\hat{\epsilon}$` is our residual term

---
## Bias in Estimators

---
## Bias in Estimators

---
## Bias in Estimators
<img src="Week03_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" />

---
## Bias in Estimators

Ideally:

`$$E(\hat{\beta_j}) = \beta_j$$`

If this is true, then `$X$` and `$\epsilon$` are uncorrelated and `$\hat{\beta_j}$` is an unbiased estimator of `$\beta_j$`.

--
However, if `$X$` and `$\epsilon$` _are_ correlated, then `$\hat{\beta_j}$` is a biased estimator such that: 
`$$E(\hat{\beta_j})=\beta_1 + corr(X, \epsilon)\frac{\sigma_{\epsilon}}{\sigma_X}$$`

---
## Standard Error of the Regression

The __standard error of the regression (SER)__ (or standard error of the estimate or root mean squared error) provides an alternative goodness of fit measured in the units of the dependent variable

`$$\hat{\sigma} = \sqrt{\hat{\sigma^2}}$$`

--
where `$\hat{\sigma^2}$` is the estimate of the error variance calculated by:

`$$\hat{\sigma^2}=\frac{1}{n-2}\sum{\hat{u_i^2}} = \frac{SSR}{n-2}$$`

where `$n-2$` is the degrees of freedom.

Our metric of interest, `$\hat{\sigma}$` is therefore the standard deviation in the unobservables affecting `$y$`.

---
## Standard Error of the Coefficient

After we have calculated the standard error of the regression, we can then estimate the standard errors associated with `$\hat{\beta_1}$`.

`$$se(\hat{\beta_1}) = \frac{\hat{\sigma}}{\sqrt{SST_x}} = \frac{\hat{\sigma}}{\sqrt{(x_i - \bar{x})^2}}$$`

--
Beyond being a useful metric on its own in characterizing the sampling distribution of our estimate it will also be used to calculate test statistics, confidence intervals, and statistical significance.

---
## Gauss-Markov Assumptions

Gauss-Markov Theorem identifies that the OLS estimator `$\hat{\beta_j}$` for `$\beta_j$` is BLUE

* __Best__ - the estimator has the smallest variance

--
* __Linear__ - the estimator is linear, that is expressed as a linear function of the data on the dependent variable

--
* __Unbiased__ - the expectation of our estimate is the true parameter, think `$E(\hat{\beta_j}) = \beta_j$`

--
* __Estimator__ - our statistical measure from a sample of a population parameter

(see more summaries on pages 52 and 92)

---
## Gauss-Markov Assumptions

1. __Linear in Parameters__
    * The dependent variable, `$y$` is related to the independent variable `$x$` and the error disturbance term such that: 
    * `$y = \beta_0 + \beta_1 x + u$`
2. __Random Sampling__ 
    * There is random sampling of size `$n$`
3. __Sample Variation in the Explanatory Variable__
    * Sample outcomes on x are not all on the same value
4. __Zero Conditional Mean__
    * The error has an expected value of zero given any value of the explanatory variable
    * `$E(u|x) = 0$`
5. __Homoskedasticity__
    * The error has the same variance given any value of the explanatory variable
    * `$Var(u|x) = \sigma^2$`

---
### Linear in Parameters

`$$y = \beta_0 + \beta_1x + u$$`
we've identified the _correct_ relationship between the dependent variable and the independent variable(s)

--
### Random Sampling

There is a random sample of size `$n, (x_i, y_i): i = 1, 2, ..., n$`

---
### Sample Variation in the Explanatory Variable

The sample outcomes for `$x$` are not the same value.

--
### Zero Conditional Mean 
`$$E(u|x) = 0$$`
The expectation of the error term -- conditional on the independent variable(s) -- is equal to 0. (__strict exogeneity__)

---
### Homoskedasticity
<img src="Week03_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" />

---
## Multiple Linear Regression

Though bivariate regression is quite powerful it is not always an appropriate for tool for exploring real-world data when the assumption of `$u$` being uncorrelated with `$x$` may not be met.

Multiple linear regression allows us to "control" for different factors that might also influence the dependent variable. This might allow us to examine a key independent variable of interest while "controlling" for other relevant factors.

In multiple linear regression we are often exploring estimates in a _ceteris paribus_ setting "other things equal."

--
`$$y=\beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + ... + \beta_kx_k + u$$`

--
While adjusting our expectation of residuals such that: 
`$$E(u|x_1, x_2, x_3, ..., x_k) = 0$$`

---
### Obtaining Estimates in Multiple Regression

So how do we estimate the beta terms?

If we have the following model: 
`$$\hat{y} = \hat{\beta_0} + \hat{\beta_1}x_1 + \hat{\beta_2}x_2$$`

--
We can again minimize the sum of squared residuals: 
`$$\sum{\hat{u}}^2$$`

--
which becomes: 
`$$\sum(y - \hat{\beta_0} - \hat{\beta_1}x_1 - \hat{\beta_2}x_2)^2$$`

--
We can obtain the first order conditions for `$\hat{\beta_0}$`, `$\hat{\beta_1}$`, and `$\hat{\beta_2}$`:

`$$\sum(y - \hat{\beta_0} - \hat{\beta_1}x_1 - \hat{\beta_2}x_2) = 0$$`
`$$\sum x_1(y - \hat{\beta_0} - \hat{\beta_1}x_1 - \hat{\beta_2}x_2) = 0$$`
`$$\sum x_2(y - \hat{\beta_0} - \hat{\beta_1}x_1 - \hat{\beta_2}x_2) = 0$$`

---
### Obtaining Estimates in Multiple Regression
Recall last week we conceptualized the bivariate model as: 
<img src="tab1.png" style="display: block; margin: auto;" />

--
with multiple independent variables it becomes:

<img src="tab2.png" style="display: block; margin: auto;" />
---
### Obtaining Estimates in Multiple Regression

Following a similar process as the previous slides, we can take our regression equation: 
`$$u = y -X\hat{\beta}$$`
minimize the equation and re-arrange so that: 
`$$\hat{\beta}=(X'X)^{-1}X'y$$`

(Note: we're going to skip the matrix math for today...)

---
### Interpreting Multiple Regression

`$$\hat{y} = \hat{\beta_0} + \hat{\beta_1}x_1 + \hat{\beta_2}x_2$$`

Interpreting `$\hat{\beta_0}$`: the predicted value of `$y$` when `$x_1 = x_2 = 0$`

--
Interpreting `$\hat{\beta_j}$`: 
* `$\Delta{\hat{y}} = \hat{\beta_1}\Delta{x_1} + \hat{\beta_2}\Delta{x_2}$` 
* Therefore, when `$\Delta{x_2} = 0$` then: `$\Delta\hat{y}= \hat{\beta_1}x_1$` (i.e. holding `$x_2$` fixed)
* And, when `$\Delta{x_1} = 0$` then: `$\Delta\hat{y}= \hat{\beta_2}x_2$` (i.e. holding `$x_1$` fixed)

---
### Interpreting Multiple Regression: Example

Borrowing from Wooldridge 3.3: If we estimate a regression equation on 401(k) pension plans such that:

`$$\hat{prate} = 80.1 + 5.5mrate + 2.4age$$`

* What is the intercept? 
* How can we interpret the coefficient estimate on `$mrate$`?
* How can we interpret the coefficient estimate on `$age$`?
* Can we say the effect of `$mrate$` on `$prate$` is bigger than the effect of `$age$` on `$prate$`? Why or why not?

---
### A Quick note on `$R^2$`

Review: 
`$$SST=SSE+SSR$$`

where: 
`$$\frac{SSE}{SST} + \frac{SSR}{SST} = 1$$`

As we add predictors to a model, `$R^2$` won't decrease. Why might this be an issue?

(More on correcting this soon...)

---
## Model Specification and Omitted Variable Bias

Model specification often refers to the process of identifying which predictors (independent variables) to include or exclude from a linear regression equation.

In one direction, we can __overspecify__ our model, which is when irrelevant variables that have no partial effect on `$y$` in the population are included
* _Though the inclusion of these irrelevant variables does not influence bias of the estimates it may influence variance_

--
In the other direction, we can __underspecify__ our model, which is when relevant variables are excluded
* _Can lead to misspecification and omitted variable bias (OVB)_

---
### Omitted Variable Bias

Omitted Variable Bias (OVB) occurs when:

`$$E(u_i|X_i) \neq 0$$`

--
which requires two conditions to be met:

1. the omitted variable (z) correlates with the dependent variable (y)
2. the omitted variable (z) must correlate with at least one independent variable in the model (x)

---
### Omitted Variable Bias

Full regression (the relationship of `$y$` with `$x_1$` and `$x_2$`):
`$$y = \hat{\beta_0} + \hat{\beta_1}x_1 + \hat{\beta_2}x_2 + \hat{u}$$`

--
Partial regression (the relationship of `$y$` with `$x_1$` without `$x_2$`):
`$$y=\tilde{\beta_0} + \tilde{\beta_1}x_1 + \tilde{u}$$`

--
Auxiliary regression (the relationship of `$x_2$` and `$x_1$`):
`$$x_2 = \delta_0 + \delta_1x_1 + e$$`

--
Therefore, Omitted Variable Bias is: 
`$$\tilde{\beta_1} = \hat{\beta_1} + \delta_1\hat{\beta_2}$$`

---
### Nature of the Bias

---
### Omitted Variable Bias: Example
<table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;caption-side: bottom;color: #000000;border-top: 2px solid #000000;">
<caption>Statistical models</caption>
<thead>
<tr>
<th style="padding-left: 5px;padding-right: 5px;">&nbsp;</th>
<th style="padding-left: 5px;padding-right: 5px;">Life Expectancy</th>
<th style="padding-left: 5px;padding-right: 5px;">Life Expectancy</th>
<th style="padding-left: 5px;padding-right: 5px;">Polio</th>
</tr>
</thead>
<tbody>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">(Intercept)</td>
<td style="padding-left: 5px;padding-right: 5px;">42.90<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">40.20<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">38.29<sup>***</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(1.59)</td>
<td style="padding-left: 5px;padding-right: 5px;">(1.59)</td>
<td style="padding-left: 5px;padding-right: 5px;">(7.83)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Schooling</td>
<td style="padding-left: 5px;padding-right: 5px;">2.23<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">1.98<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">3.46<sup>***</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.12)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.12)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.59)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Polio</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">0.07<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.01)</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
</tr>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">R<sup>2</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.67</td>
<td style="padding-left: 5px;padding-right: 5px;">0.71</td>
<td style="padding-left: 5px;padding-right: 5px;">0.17</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Adj. R<sup>2</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.67</td>
<td style="padding-left: 5px;padding-right: 5px;">0.71</td>
<td style="padding-left: 5px;padding-right: 5px;">0.16</td>
</tr>
<tr style="border-bottom: 2px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">Num. obs.</td>
<td style="padding-left: 5px;padding-right: 5px;">173</td>
<td style="padding-left: 5px;padding-right: 5px;">173</td>
<td style="padding-left: 5px;padding-right: 5px;">173</td>
</tr>
</tbody>
<tfoot>
<tr>
<td style="font-size: 0.8em;" colspan="4"><sup>***</sup>p &lt; 0.001; <sup>**</sup>p &lt; 0.01; <sup>*</sup>p &lt; 0.05</td>
</tr>
</tfoot>
</table>

--
.pull-left[
Since: 
* `$\hat{\beta_1} = 1.98$`
* `$\delta_1 = 3.46$`
* `$\hat{\beta_2} = 0.07$`
* `$\tilde{\beta_1} = 2.23$`
]

.pull-right[
Therefore:
* `$\tilde{\beta_1} = \hat{\beta_1} + \delta_1\hat{\beta_2}$`
* `$\tilde{\beta_1} = 1.98 + 3.46*0.07$`
* `$\tilde{\beta_1} = 2.23$`

]

---
## Making Sense of Stata Output

---
## Making Sense of Stata Output

---
## Making Sense of Stata Output

---
## Making Sense of Stata Output

---
## Working with Logged Variables

Logging variables is a simple transformation that allows the researcher to work with data in alternative units.

Remember: 
`$$log_b(p) = x \leftrightarrow b^x = p$$`

For example: 
`$$log_2(8)=3$$`
where "the logarithm of 8 with base 2 is 3"

---
## Working with Logged Variables
In a regression framework, we often use the natural log (base `$e$` which is `$\approx 2.718$`) and there are four forms using natural logs in our regression framework:

---
## Working with Logged Variables

In other words...

* Log-level: `$\% \Delta y = (100\beta_1)\Delta x$`
    * which means a one-unit increase in x is associated with a `$\beta_1$` percent change in Y
* Level-Log: `$\Delta y = (\beta_1/100)\%\Delta x$`
    * which means a one percent increase in x is associated with a `$\frac{\beta_1}{100}$` change in Y
* Log-Log: `$\%\Delta y = \beta_1 \% \Delta x$`
    * which means a one percent increase in x is associated with a `$\beta_1$` percent change in Y

Logs can also be useful for narrowing the scope of our data. To quote Wooldridge: 
> “there are some standard rules of thumb for taking logs, although none is written in stone. When a variable is a positive dollar amount, the log is often taken... Variables such as population... often appearin logarithmic form.”

---
## Working with Logged Variables: Example
<img src="Week03_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" />

---
## Working with Logged Variables: Example
<table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;caption-side: bottom;color: #000000;border-top: 2px solid #000000;">
<caption>Statistical models</caption>
<thead>
<tr>
<th style="padding-left: 5px;padding-right: 5px;">&nbsp;</th>
<th style="padding-left: 5px;padding-right: 5px;">GDP</th>
<th style="padding-left: 5px;padding-right: 5px;">log(GDP)</th>
</tr>
</thead>
<tbody>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">(Intercept)</td>
<td style="padding-left: 5px;padding-right: 5px;">-927.56</td>
<td style="padding-left: 5px;padding-right: 5px;">6.53<sup>***</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(2022.65)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.27)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">BMI</td>
<td style="padding-left: 5px;padding-right: 5px;">193.23<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.03<sup>***</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(42.79)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.01)</td>
</tr>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">R<sup>2</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.12</td>
<td style="padding-left: 5px;padding-right: 5px;">0.16</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Adj. R<sup>2</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.11</td>
<td style="padding-left: 5px;padding-right: 5px;">0.16</td>
</tr>
<tr style="border-bottom: 2px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">Num. obs.</td>
<td style="padding-left: 5px;padding-right: 5px;">152</td>
<td style="padding-left: 5px;padding-right: 5px;">152</td>
</tr>
</tbody>
<tfoot>
<tr>
<td style="font-size: 0.8em;" colspan="3"><sup>***</sup>p &lt; 0.001; <sup>**</sup>p &lt; 0.01; <sup>*</sup>p &lt; 0.05</td>
</tr>
</tfoot>
</table>

---
## Where We're Going Next

* Statistical Significance
* More on Goodness of Fit 
* Collinearity
* Interaction terms 
* Transformations and Quadratic Terms

---
## Next Week's Readings

* Wooldridge: Chapters 4 and Section 6-3
* Hamilton: Chapter 3