class: center, middle, inverse, title-slide # PPOL 502-07: Reg. Methods for Policy Analysis ## Week 03: Multiple Regression and Gauss Markov ### Alexander Podkul, PhD ### Spring 2022 --- ## Today's Class Outline * Problem Sets(!) and Other Notes * Bias in Estimators * Standard Error of the Regression * Standard Error of the Coefficient * Gauss-Markov Assumptions * Multiple Linear Regression * A Quick note on `\(R^2\)` * Model Specification and Omitted Variable Bias * Making Sense of Stata Output * Working with Logged Variables * Where We're Going Next * Next Week's Readings * __Break__ * Reviewing Relevant Stata Commands --- ## Problem Sets and Other Notes ### Problem Sets Problem set #1 due __tonight__ (February 2) Problem set #2 assigned __next week__ (February 9) Problem set #3 assigned __the following week__ (February 16) -- ### Other Notes * A few additional resources added to the "Resources" tab on the course website * In-Person instruction clarifications --- ## Bias in Estimators Last week we covered the standard bivariate model: `$$Y_i = \beta_0 + \beta_1X_i + \epsilon_i$$` -- From which we estimate the following: `$$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1}X_i$$` -- where: * `\(\hat{Y_i}\)` is the estimate for our fitted values * `\(\hat{\beta_0}\)` and `\(\hat{\beta_1}\)` are estimates for `\(\beta_0\)` and `\(\beta_1\)`, respectively * `\(\hat{\epsilon}\)` is our residual term --- ## Bias in Estimators <img src="Week03_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> --- ## Bias in Estimators <img src="Week03_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> --- ## Bias in Estimators <img src="Week03_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> --- ## Bias in Estimators Ideally: `$$E(\hat{\beta_j}) = \beta_j$$` If this is true, then `\(X\)` and `\(\epsilon\)` are uncorrelated and `\(\hat{\beta_j}\)` is an unbiased estimator of `\(\beta_j\)`. -- However, if `\(X\)` and `\(\epsilon\)` _are_ correlated, then `\(\hat{\beta_j}\)` is a biased estimator such that: `$$E(\hat{\beta_j})=\beta_1 + corr(X, \epsilon)\frac{\sigma_{\epsilon}}{\sigma_X}$$` --- ## Standard Error of the Regression The __standard error of the regression (SER)__ (or standard error of the estimate or root mean squared error) provides an alternative goodness of fit measured in the units of the dependent variable `$$\hat{\sigma} = \sqrt{\hat{\sigma^2}}$$` -- where `\(\hat{\sigma^2}\)` is the estimate of the error variance calculated by: `$$\hat{\sigma^2}=\frac{1}{n-2}\sum{\hat{u_i^2}} = \frac{SSR}{n-2}$$` where `\(n-2\)` is the degrees of freedom. Our metric of interest, `\(\hat{\sigma}\)` is therefore the standard deviation in the unobservables affecting `\(y\)`. --- ## Standard Error of the Coefficient After we have calculated the standard error of the regression, we can then estimate the standard errors associated with `\(\hat{\beta_1}\)`. `$$se(\hat{\beta_1}) = \frac{\hat{\sigma}}{\sqrt{SST_x}} = \frac{\hat{\sigma}}{\sqrt{(x_i - \bar{x})^2}}$$` -- Beyond being a useful metric on its own in characterizing the sampling distribution of our estimate it will also be used to calculate test statistics, confidence intervals, and statistical significance. --- ## Gauss-Markov Assumptions Gauss-Markov Theorem identifies that the OLS estimator `\(\hat{\beta_j}\)` for `\(\beta_j\)` is BLUE * __Best__ - the estimator has the smallest variance -- * __Linear__ - the estimator is linear, that is expressed as a linear function of the data on the dependent variable -- * __Unbiased__ - the expectation of our estimate is the true parameter, think `\(E(\hat{\beta_j}) = \beta_j\)` -- * __Estimator__ - our statistical measure from a sample of a population parameter (see more summaries on pages 52 and 92) --- ## Gauss-Markov Assumptions 1. __Linear in Parameters__ * The dependent variable, `\(y\)` is related to the independent variable `\(x\)` and the error disturbance term such that: * `\(y = \beta_0 + \beta_1 x + u\)` 2. __Random Sampling__ * There is random sampling of size `\(n\)` 3. __Sample Variation in the Explanatory Variable__ * Sample outcomes on x are not all on the same value 4. __Zero Conditional Mean__ * The error has an expected value of zero given any value of the explanatory variable * `\(E(u|x) = 0\)` 5. __Homoskedasticity__ * The error has the same variance given any value of the explanatory variable * `\(Var(u|x) = \sigma^2\)` --- ### Linear in Parameters `$$y = \beta_0 + \beta_1x + u$$` we've identified the _correct_ relationship between the dependent variable and the independent variable(s) -- ### Random Sampling There is a random sample of size `\(n, (x_i, y_i): i = 1, 2, ..., n\)` --- ### Sample Variation in the Explanatory Variable The sample outcomes for `\(x\)` are not the same value. -- ### Zero Conditional Mean `$$E(u|x) = 0$$` The expectation of the error term -- conditional on the independent variable(s) -- is equal to 0. (__strict exogeneity__) --- ### Homoskedasticity <img src="Week03_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ## Multiple Linear Regression Though bivariate regression is quite powerful it is not always an appropriate for tool for exploring real-world data when the assumption of `\(u\)` being uncorrelated with `\(x\)` may not be met. Multiple linear regression allows us to "control" for different factors that might also influence the dependent variable. This might allow us to examine a key independent variable of interest while "controlling" for other relevant factors. In multiple linear regression we are often exploring estimates in a _ceteris paribus_ setting "other things equal." -- `$$y=\beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 + ... + \beta_kx_k + u$$` -- While adjusting our expectation of residuals such that: `$$E(u|x_1, x_2, x_3, ..., x_k) = 0$$` --- ### Obtaining Estimates in Multiple Regression So how do we estimate the beta terms? If we have the following model: `$$\hat{y} = \hat{\beta_0} + \hat{\beta_1}x_1 + \hat{\beta_2}x_2$$` -- We can again minimize the sum of squared residuals: `$$\sum{\hat{u}}^2$$` -- which becomes: `$$\sum(y - \hat{\beta_0} - \hat{\beta_1}x_1 - \hat{\beta_2}x_2)^2$$` -- We can obtain the first order conditions for `\(\hat{\beta_0}\)`, `\(\hat{\beta_1}\)`, and `\(\hat{\beta_2}\)`: `$$\sum(y - \hat{\beta_0} - \hat{\beta_1}x_1 - \hat{\beta_2}x_2) = 0$$` `$$\sum x_1(y - \hat{\beta_0} - \hat{\beta_1}x_1 - \hat{\beta_2}x_2) = 0$$` `$$\sum x_2(y - \hat{\beta_0} - \hat{\beta_1}x_1 - \hat{\beta_2}x_2) = 0$$` --- ### Obtaining Estimates in Multiple Regression Recall last week we conceptualized the bivariate model as: <img src="tab1.png" style="display: block; margin: auto;" /> -- with multiple independent variables it becomes: <img src="tab2.png" style="display: block; margin: auto;" /> --- ### Obtaining Estimates in Multiple Regression Following a similar process as the previous slides, we can take our regression equation: `$$u = y -X\hat{\beta}$$` minimize the equation and re-arrange so that: `$$\hat{\beta}=(X'X)^{-1}X'y$$` (Note: we're going to skip the matrix math for today...) --- ### Interpreting Multiple Regression `$$\hat{y} = \hat{\beta_0} + \hat{\beta_1}x_1 + \hat{\beta_2}x_2$$` Interpreting `\(\hat{\beta_0}\)`: the predicted value of `\(y\)` when `\(x_1 = x_2 = 0\)` -- Interpreting `\(\hat{\beta_j}\)`: * `\(\Delta{\hat{y}} = \hat{\beta_1}\Delta{x_1} + \hat{\beta_2}\Delta{x_2}\)` * Therefore, when `\(\Delta{x_2} = 0\)` then: `\(\Delta\hat{y}= \hat{\beta_1}x_1\)` (i.e. holding `\(x_2\)` fixed) * And, when `\(\Delta{x_1} = 0\)` then: `\(\Delta\hat{y}= \hat{\beta_2}x_2\)` (i.e. holding `\(x_1\)` fixed) --- ### Interpreting Multiple Regression: Example Borrowing from Wooldridge 3.3: If we estimate a regression equation on 401(k) pension plans such that: `$$\hat{prate} = 80.1 + 5.5mrate + 2.4age$$` * What is the intercept? * How can we interpret the coefficient estimate on `\(mrate\)`? * How can we interpret the coefficient estimate on `\(age\)`? * Can we say the effect of `\(mrate\)` on `\(prate\)` is bigger than the effect of `\(age\)` on `\(prate\)`? Why or why not? --- ### A Quick note on `\(R^2\)` Review: `$$SST=SSE+SSR$$` where: `$$\frac{SSE}{SST} + \frac{SSR}{SST} = 1$$` As we add predictors to a model, `\(R^2\)` won't decrease. Why might this be an issue? (More on correcting this soon...) --- ## Model Specification and Omitted Variable Bias Model specification often refers to the process of identifying which predictors (independent variables) to include or exclude from a linear regression equation. In one direction, we can __overspecify__ our model, which is when irrelevant variables that have no partial effect on `\(y\)` in the population are included * _Though the inclusion of these irrelevant variables does not influence bias of the estimates it may influence variance_ -- In the other direction, we can __underspecify__ our model, which is when relevant variables are excluded * _Can lead to misspecification and omitted variable bias (OVB)_ --- ### Omitted Variable Bias Omitted Variable Bias (OVB) occurs when: `$$E(u_i|X_i) \neq 0$$` -- which requires two conditions to be met: 1. the omitted variable (z) correlates with the dependent variable (y) 2. the omitted variable (z) must correlate with at least one independent variable in the model (x) --- ### Omitted Variable Bias Full regression (the relationship of `\(y\)` with `\(x_1\)` and `\(x_2\)`): `$$y = \hat{\beta_0} + \hat{\beta_1}x_1 + \hat{\beta_2}x_2 + \hat{u}$$` -- Partial regression (the relationship of `\(y\)` with `\(x_1\)` without `\(x_2\)`): `$$y=\tilde{\beta_0} + \tilde{\beta_1}x_1 + \tilde{u}$$` -- Auxiliary regression (the relationship of `\(x_2\)` and `\(x_1\)`): `$$x_2 = \delta_0 + \delta_1x_1 + e$$` -- Therefore, Omitted Variable Bias is: `$$\tilde{\beta_1} = \hat{\beta_1} + \delta_1\hat{\beta_2}$$` --- ### Nature of the Bias <img src="table.png" style="display: block; margin: auto;" /> --- ### Omitted Variable Bias: Example <table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;caption-side: bottom;color: #000000;border-top: 2px solid #000000;"> <caption>Statistical models</caption> <thead> <tr> <th style="padding-left: 5px;padding-right: 5px;"> </th> <th style="padding-left: 5px;padding-right: 5px;">Life Expectancy</th> <th style="padding-left: 5px;padding-right: 5px;">Life Expectancy</th> <th style="padding-left: 5px;padding-right: 5px;">Polio</th> </tr> </thead> <tbody> <tr style="border-top: 1px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">(Intercept)</td> <td style="padding-left: 5px;padding-right: 5px;">42.90<sup>***</sup></td> <td style="padding-left: 5px;padding-right: 5px;">40.20<sup>***</sup></td> <td style="padding-left: 5px;padding-right: 5px;">38.29<sup>***</sup></td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;">(1.59)</td> <td style="padding-left: 5px;padding-right: 5px;">(1.59)</td> <td style="padding-left: 5px;padding-right: 5px;">(7.83)</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Schooling</td> <td style="padding-left: 5px;padding-right: 5px;">2.23<sup>***</sup></td> <td style="padding-left: 5px;padding-right: 5px;">1.98<sup>***</sup></td> <td style="padding-left: 5px;padding-right: 5px;">3.46<sup>***</sup></td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;">(0.12)</td> <td style="padding-left: 5px;padding-right: 5px;">(0.12)</td> <td style="padding-left: 5px;padding-right: 5px;">(0.59)</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Polio</td> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;">0.07<sup>***</sup></td> <td style="padding-left: 5px;padding-right: 5px;"> </td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;">(0.01)</td> <td style="padding-left: 5px;padding-right: 5px;"> </td> </tr> <tr style="border-top: 1px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">R<sup>2</sup></td> <td style="padding-left: 5px;padding-right: 5px;">0.67</td> <td style="padding-left: 5px;padding-right: 5px;">0.71</td> <td style="padding-left: 5px;padding-right: 5px;">0.17</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Adj. R<sup>2</sup></td> <td style="padding-left: 5px;padding-right: 5px;">0.67</td> <td style="padding-left: 5px;padding-right: 5px;">0.71</td> <td style="padding-left: 5px;padding-right: 5px;">0.16</td> </tr> <tr style="border-bottom: 2px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">Num. obs.</td> <td style="padding-left: 5px;padding-right: 5px;">173</td> <td style="padding-left: 5px;padding-right: 5px;">173</td> <td style="padding-left: 5px;padding-right: 5px;">173</td> </tr> </tbody> <tfoot> <tr> <td style="font-size: 0.8em;" colspan="4"><sup>***</sup>p < 0.001; <sup>**</sup>p < 0.01; <sup>*</sup>p < 0.05</td> </tr> </tfoot> </table> -- .pull-left[ Since: * `\(\hat{\beta_1} = 1.98\)` * `\(\delta_1 = 3.46\)` * `\(\hat{\beta_2} = 0.07\)` * `\(\tilde{\beta_1} = 2.23\)` ] .pull-right[ Therefore: * `\(\tilde{\beta_1} = \hat{\beta_1} + \delta_1\hat{\beta_2}\)` * `\(\tilde{\beta_1} = 1.98 + 3.46*0.07\)` * `\(\tilde{\beta_1} = 2.23\)` ] --- ## Making Sense of Stata Output <img src="output_1.png" style="display: block; margin: auto;" /> --- ## Making Sense of Stata Output <img src="output_2.png" style="display: block; margin: auto;" /> --- ## Making Sense of Stata Output <img src="output_3.png" style="display: block; margin: auto;" /> --- ## Making Sense of Stata Output <img src="output_4.png" style="display: block; margin: auto;" /> --- ## Working with Logged Variables Logging variables is a simple transformation that allows the researcher to work with data in alternative units. Remember: `$$log_b(p) = x \leftrightarrow b^x = p$$` For example: `$$log_2(8)=3$$` where "the logarithm of 8 with base 2 is 3" --- ## Working with Logged Variables In a regression framework, we often use the natural log (base `\(e\)` which is `\(\approx 2.718\)`) and there are four forms using natural logs in our regression framework: <img src="logta.png" style="display: block; margin: auto;" /> --- ## Working with Logged Variables In other words... * Log-level: `\(\% \Delta y = (100\beta_1)\Delta x\)` * which means a one-unit increase in x is associated with a `\(\beta_1\)` percent change in Y * Level-Log: `\(\Delta y = (\beta_1/100)\%\Delta x\)` * which means a one percent increase in x is associated with a `\(\frac{\beta_1}{100}\)` change in Y * Log-Log: `\(\%\Delta y = \beta_1 \% \Delta x\)` * which means a one percent increase in x is associated with a `\(\beta_1\)` percent change in Y Logs can also be useful for narrowing the scope of our data. To quote Wooldridge: > “there are some standard rules of thumb for taking logs, although none is written in stone. When a variable is a positive dollar amount, the log is often taken... Variables such as population... often appearin logarithmic form.” --- ## Working with Logged Variables: Example <img src="Week03_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> --- ## Working with Logged Variables: Example <table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;caption-side: bottom;color: #000000;border-top: 2px solid #000000;"> <caption>Statistical models</caption> <thead> <tr> <th style="padding-left: 5px;padding-right: 5px;"> </th> <th style="padding-left: 5px;padding-right: 5px;">GDP</th> <th style="padding-left: 5px;padding-right: 5px;">log(GDP)</th> </tr> </thead> <tbody> <tr style="border-top: 1px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">(Intercept)</td> <td style="padding-left: 5px;padding-right: 5px;">-927.56</td> <td style="padding-left: 5px;padding-right: 5px;">6.53<sup>***</sup></td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;">(2022.65)</td> <td style="padding-left: 5px;padding-right: 5px;">(0.27)</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">BMI</td> <td style="padding-left: 5px;padding-right: 5px;">193.23<sup>***</sup></td> <td style="padding-left: 5px;padding-right: 5px;">0.03<sup>***</sup></td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;"> </td> <td style="padding-left: 5px;padding-right: 5px;">(42.79)</td> <td style="padding-left: 5px;padding-right: 5px;">(0.01)</td> </tr> <tr style="border-top: 1px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">R<sup>2</sup></td> <td style="padding-left: 5px;padding-right: 5px;">0.12</td> <td style="padding-left: 5px;padding-right: 5px;">0.16</td> </tr> <tr> <td style="padding-left: 5px;padding-right: 5px;">Adj. R<sup>2</sup></td> <td style="padding-left: 5px;padding-right: 5px;">0.11</td> <td style="padding-left: 5px;padding-right: 5px;">0.16</td> </tr> <tr style="border-bottom: 2px solid #000000;"> <td style="padding-left: 5px;padding-right: 5px;">Num. obs.</td> <td style="padding-left: 5px;padding-right: 5px;">152</td> <td style="padding-left: 5px;padding-right: 5px;">152</td> </tr> </tbody> <tfoot> <tr> <td style="font-size: 0.8em;" colspan="3"><sup>***</sup>p < 0.001; <sup>**</sup>p < 0.01; <sup>*</sup>p < 0.05</td> </tr> </tfoot> </table> --- ## Where We're Going Next * Statistical Significance * More on Goodness of Fit * Collinearity * Interaction terms * Transformations and Quadratic Terms --- ## Next Week's Readings * Wooldridge: Chapters 4 and Section 6-3 * Hamilton: Chapter 3