PPOL 502-07: Reg. Methods for Policy Analysis

class: center, middle, inverse, title-slide

# PPOL 502-07: Reg. Methods for Policy Analysis
## Week 10: Binary Dependent Variables (2 of 2)
### Alexander Podkul, PhD
### Spring 2022

---

## Today's Class Outline

* Course Schedule
* Reviewing Last Week
* Expanding to Probit 
* More on Interpretation 
* Maximum Likelihood Estimation
* Hypothesis Testing 
* Goodness of Fit Metrics
* An Extended Example
* Where We're Going Next 
* __Break__
* A Word on Data Project Complications
* Working in Stata

---
## Course Schedule

__Tonight (3/30)__

* Problem set #5 due

__Next Week (4/6)__

* Data Project: check-in #2 due (on Canvas)

---
## Other Course Notes

__Course survey__

* Thank you for those who took the time to fill it out!
* I'll especially take into consideration the comments about the midterm (looking ahead to the final exam) and keep you updated on any expected changes

--
__Applied Research For Policy Analysis__

* Covering applications of statistical concepts 
* Post suggested material on Canvas (starting tonight after class) 
* Will cover a few works as well as other final exam review material

---
## Reviewing Last Week: LPM

Last week, we covered a number of topics for exploring _binary dependent variables_, which can be useful for detecting the presence or absence of a particular attribute.

--
First, we spoke about the __linear probability model__, which is a simple adaptation of the standard multiple linear regression model where `$y$` happens to be a binary measure. Although interpretation of this model is simple (since `$\Delta P(y=1|x) = \beta_j\Delta x_j$`), there are two shortcomings:

--
* potentially nonsensical predictions/fitted values, where `$\hat{y_i} <0$` or `$\hat{y_i} >1$`
* by definition (since `$y_i$` only takes on values of 0 or 1) the model will have significant heteroskedasticity

---
## Reviewing Last Week: LVM

We then pivoted to covering the __latent variable model__ which models an unobserved (latent) variable representing a continuous metric for exploring when `$Y = 1$` and `$Y=0$`

--
After some manipulation, this left us with the workhorse model of:

`$$P(y = 1|x) = G(\beta_0 + \beta_1x_1 + \beta_2x_2  + ... + \beta_kx_k)$$`

--
where `$G(z)$` is our _link function_

---
## Reviewing Last Week: Logit

We also discussed the __logit model__, which uses the following _link function_: 
`$$G(z) = \frac{e^z}{1 + e^z} = \Lambda(z)$$`

--
this logit function, which can be expressed via a number of different equations, has the useful feature of being bounded between 0 and 1 like: 
<img src="Week10_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" />

... where errors are distributed according to a logistic distribution

---
## Reviewing Last Week: Logit (Cont.)

Given this non-linear relationship (in terms of odds and probability), we also discussed the following notes:

1. Not estimated via OLS (more on this tonight)
2. Shifting from t-tests to z-tests (more on this tonight)
3. Goodness of fit metrics (more on this tonight)
4. The effect of `$x_j$` is going to depend on the value of `$x_j$`
5. The effect of `$x_j$` is going to depend on the value of other independent variables

--
which is (unfortunately) going to affect:

- how we conduct hypothesis tests

--
- how we consider how our model is estimated

--
- how we consider how _well_ our model is estimated

--
- how we _interpret_ our model (possible tools: PEA, AME, and Observed value, Discrete Differences)

---
## Introducing Probit

__Probit__ refers to another category of binary dependent variable models with a different link function for `$G(z)$`

In these models, `$G(z)$` is: 
`$$G(z) = \Phi(z) = \int_{-\infty}^{z} \phi(v) dv$$`

where `$\phi(v)$` is the standard normal density:

--
`$$\phi(z) = (2\pi)^{1/2}e^{-z^2/2}$$`

and the error is distributed according to a standard normal distribution.

---
## Introducing Probit

---
## Introducing Probit

---
## Introducing Probit

---
## Introducing Probit

---
## Introducing Probit

---
## Introducing Probit

---
## Introducing Probit

Probit models are set up and interpreted in similar fashion to the logit model. For example, in the following set up:

`$$P(Y_i = 1) = \Phi(\beta_0 + \beta_1 x_1 + \beta_2 x_2)$$`
where `$\Phi(z)$`, representing the standard normal cumulative distribution function, is standing in for the link function `$G(z)$`.

--
For example, imagine the following estimated equation
`$$P(Y_i = 1) = \Phi(0.05 + 1x_1 + -2 x_2)$$`

--
If we want to assess the predicted probability that `$Y_i = 1$` when `$x_1 = 4$` and `$x_2 = 1.5$`, we can solve:

`$$P(Y_i = 1) = \Phi(0.05 + 1(4) + -2(1.5))$$`

--
`$$P(Y_i = 1) = \Phi(1.05)$$`

--
`$$P(Y_i = 1) = .85$$`

---
## Introducing Probit: Example

<table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;caption-side: bottom;color: #000000;border-top: 2px solid #000000;">
<caption>Statistical models</caption>
<thead>
<tr>
<th style="padding-left: 5px;padding-right: 5px;">&nbsp;</th>
<th style="padding-left: 5px;padding-right: 5px;">LPM</th>
<th style="padding-left: 5px;padding-right: 5px;">Logit</th>
<th style="padding-left: 5px;padding-right: 5px;">Probit</th>
</tr>
</thead>
<tbody>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">Intercept</td>
<td style="padding-left: 5px;padding-right: 5px;">0.54</td>
<td style="padding-left: 5px;padding-right: 5px;">0.21</td>
<td style="padding-left: 5px;padding-right: 5px;">0.40</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.34)</td>
<td style="padding-left: 5px;padding-right: 5px;">(1.87)</td>
<td style="padding-left: 5px;padding-right: 5px;">(1.10)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Union Density</td>
<td style="padding-left: 5px;padding-right: 5px;">0.04<sup>**</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.27<sup>**</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.15<sup>**</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.01)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.10)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.05)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">South</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.09</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.27</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.17</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.16)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.82)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.49)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Unemployment</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.08</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.53</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.35</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.06)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.38)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.22)</td>
</tr>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">R<sup>2</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.28</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Adj. R<sup>2</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.23</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Num. obs.</td>
<td style="padding-left: 5px;padding-right: 5px;">50</td>
<td style="padding-left: 5px;padding-right: 5px;">50</td>
<td style="padding-left: 5px;padding-right: 5px;">50</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">AIC</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">59.90</td>
<td style="padding-left: 5px;padding-right: 5px;">60.14</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">BIC</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">67.55</td>
<td style="padding-left: 5px;padding-right: 5px;">67.79</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Log Likelihood</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">-25.95</td>
<td style="padding-left: 5px;padding-right: 5px;">-26.07</td>
</tr>
<tr style="border-bottom: 2px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">Deviance</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">51.90</td>
<td style="padding-left: 5px;padding-right: 5px;">52.14</td>
</tr>
</tbody>
<tfoot>
<tr>
<td style="font-size: 0.8em;" colspan="4"><sup>***</sup>p &lt; 0.001; <sup>**</sup>p &lt; 0.01; <sup>*</sup>p &lt; 0.05</td>
</tr>
</tfoot>
</table>

---
## Introducing Probit: Example

<table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:right;"> Union Density </th>
   <th style="text-align:left;"> South </th>
   <th style="text-align:right;"> Unemployment </th>
   <th style="text-align:right;"> Logit Pred. </th>
   <th style="text-align:right;"> Probit Pred. </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 10.45 </td>
   <td style="text-align:left;"> Nonsouth </td>
   <td style="text-align:right;"> 5.25 </td>
   <td style="text-align:right;"> 0.5593598 </td>
   <td style="text-align:right;"> 0.5564192 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 10.45 </td>
   <td style="text-align:left;"> South </td>
   <td style="text-align:right;"> 5.25 </td>
   <td style="text-align:right;"> 0.4918881 </td>
   <td style="text-align:right;"> 0.4897414 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 25.00 </td>
   <td style="text-align:left;"> South </td>
   <td style="text-align:right;"> 4.60 </td>
   <td style="text-align:right;"> 0.9852026 </td>
   <td style="text-align:right;"> 0.9923343 </td>
  </tr>
</tbody>
</table>

---
## Introducing Probit: Example
<img src="Week10_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" />

---
## Introducing Probit: Example

---
## Reviewing Interpretation

The bad news?: Probit coefficients suffer the same problems as logit models in that they are tricky to interpret.

--
The good news? We can use the same tools that we discussed last week.

--
To review:

* partial effect at the average 
* average marginal effect
* observed-value, discrete differences

---
## Reviewing Interpretation: Probit Example

We can explore observed-value, discrete differences in the probit set-up.

--
Let's follow the same model as last week for assessing the effect of `$South$` by calculating the difference between our predicted probabilities for the observed values for when south = 1 and south = 1 such that...

`$$P_{0i} = \Phi(0.40 + 0.15union_i + -0.17(0) + -0.35Unemploy_i)$$`

`$$P_{1i} = \Phi(0.40 + 0.15union_i + -0.17(1) + -0.35Unemploy_i)$$`

--
and by identifying the average difference between: 
`$$P_{1i} - P_{0i}$$`

--
`$$-0.05$$`

---
## Maximum Likelihood Estimation

Last week, we mentioned that logit (and probit) models are no longer estimated via OLS (ordinary least squares). Instead, these models are fit using __maximum likelihood estimation__ (or, MLE). This estimation technique is produced via an iterative process in order for us to identify the coefficient estimates. In MLE, the iterative process will test a variety of relationships that exist in the data and identifies the estimated relationship by maximizing the likelihood of observing each relationship.

--
In other words, we're trying to ascertain probability of observing the data we observe.

--
In a very silly example, let's say we randomly speak to 3 Georgetown graduate students. If we identify that 2 of those students are from McCourt, we might ask "what is the likelihood of observing that combination?", or: 
`$$L = p_{McCourt} * p_{McCourt} * (1-p_{McCourt})  = p_{McCourt}^2 - p_{McCourt}^3$$`

---
## Maximum Likelihood Estimation

To "solve" this problem, we can simply _guess_ and _maximize_ the likelihood by picking on the value that produces the largest `$L$`.

--
<table class=" lightable-paper" style='font-family: "Arial Narrow", arial, helvetica, sans-serif; margin-left: auto; margin-right: auto;'>
 <thead>
  <tr>
   <th style="text-align:right;"> p </th>
   <th style="text-align:right;"> L </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0.3 </td>
   <td style="text-align:right;"> 0.063 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.6 </td>
   <td style="text-align:right;"> 0.144 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0.9 </td>
   <td style="text-align:right;"> 0.081 </td>
  </tr>
</tbody>
</table>

--
In this case, it's quite simple; but in a more complicated, multiple parameter problem this becomes much more complicated. This logic is extended in the estimation of models that use MLE, for example, such that:
`$$L = \Phi(\beta_0 + \beta_1x_1)  * \Phi(\beta_0 + \beta_1x_2) * (1-\Phi(\beta_0 + \beta_1x_3))$$`
... and that's as far as we'll go on MLE.

---
## Hypothesis Testing

Similar to the hypothesis testing framework we discussed earlier in the semester, we can similarly test various types of hypotheses in binary dependent variable models.

--
Remember, the general framework is:

1. Setting up a testable hypothesis
2. Identifying the significance level
3. Calculating some test-statistic (and/or corresponding p-value and/or confidence interval)
4. Make the conclusion

--
In the binary dependent variable framework, we'll talk through the minor changes that we'll make to two types of inferences: single parameter and multiple hypotheses.

---
### Single Parameter

Just like in OLS, we can estimate standard errors in our binary dependent variable model (which we're not going to cover). To test our null hypothesis -- e.g. `$H_0: \beta_j = 0$` -- we shift from using the t-statistic (OLS) to using the Wald statistic (Stata is going to display a Z-score as the Wald statistic is asymptotically distributed as a standard normal distribution), which is calcuated as:

`$$W = \frac{\hat{\beta_j} - \beta_0}{\hat{se}(\hat{\beta_j})} \sim N(0,1)$$`

... from which we can compare to a critical value or calculate a p-value in the usual way.

---
### Multiple Hypotheses

To test multiple hypotheses (think: similar to F-tests in OLS), we can use the __likelihood ratio statistic__ (see, Wooldridge 17.12 for more).

--
With the likelihood ratio statistic, we can test either:

* whether two coefficients are equal to each other, `$H_0: \beta_1 = \beta_2$`
* whether more than one coefficient is equal to zero, `$H_0: \beta_1 = \beta_2 = 0$`

--
To calculate the likelihood ratio statistic:
`$$LR = 2[log(L_{unrestricted}) - log(L_{restricted})]$$`

where:

* `$log(L_{unrestricted})$` - the log likelihood reported from the original (unrestricted) model
* `$log(L_{restricted})$` - the log likelihood reported from the adapted (restricted by the hypotheses) model

---
### Multiple Hypotheses: Likelihood Ratio Statistic

* Estimate the full model (unrestricted model).

`$$P(Y = 1) = G(\beta_0 + \beta_1x_1 + \beta_2x_2)$$`

--
* Estimate the restricted model based on the null hypotheses.

* If we're testing whether more than one coefficient is equal to zero, `$H_0: \beta_1 = \beta_2 = 0$`
    
    `$$P(Y = 1)  = G(\beta_0)$$`
    
    * If we're testing whether coefficients are equal, `$H_0: \beta_1 = \beta_2$`
    
    `$$P(Y = 1)  = G(\beta_0 + \beta_1(x_1 + x_2))$$`

--
* Plug the estimated likelihoods associated with each equation into the likelihood ratio model and find the associated test-statistic, which is from a `$\chi^2$` distribution (where `$df$`= `$K$`)

---
### Multiple Hypotheses: Likelihood Ratio Statistic

---
### Multiple Hypotheses: Likelihood Ratio Statistic (Ex. 1)

---
### Multiple Hypotheses: Likelihood Ratio Statistic (Ex. 2)

Let's say we want to estimate an equation: 
`$$P(Clinton Won = 1)= \Lambda(\beta_0 + \beta_1Age + \beta_2log(PCI) + \beta_3CrimeIdx)$$`
and we want to test:

--
`$$H_0: \beta_2 = \beta_3 = 0$$`

---
### Multiple Hypotheses: Likelihood Ratio Statistic (Ex. 2)

`$$LR = 2[log(L_{unrestricted}) - log(L_{restricted})]$$`

--
`$$LR = 2[-1059.93 - -1122.20]$$`

--
`$$LR = 124.5$$`

--
`$$124.5 > 7.4$$`

where 7.4 represents the critical value from a `$\chi^2(2)$` distribution

---
## Another Goodness of Fit Metric

When dealing with OLS we dealt with a number of other goodness of fit metrics. However, `$R^2$` cannot be computed in the same way as in OLS regression. Some researchers will report a "pseudo- `$R^2$` " which is interpreted in a similar way (as it is presented from 0 to 1). There are a number of ways to calculate this value with various trade-offs.

--
Stata calculates the McFadden Pseudo-$R^2$ which is calculated by:

`$$\bar{R^2} = 1 - \frac{LL_{mod}}{LL_{0}}$$`
where:

* `$LL_{mod}$` is the log likelihood for the fitted model 
* `$LL_{0}$` is the log likelihood for a model without covariates (only an intercept term)

---
## Another Goodness of Fit Metric: Example

---
## Another Goodness of Fit Metric: Example

--
`$$\bar{R^2} = 1 - \frac{LL_{mod}}{LL_{0}}$$`

--
`$$\bar{R^2} = 1 - \frac{-11.56}{-33.87}$$`

--
`$$\bar{R^2} = .6587$$`

---
## Extended Example

Let's walk through a longer example exploring what we've covered.

Imagine we are trying to understand the predictors associated with ideological conservatism among Americans. We can consider the following (overly simplified, improperly specified) models:

--
__Linear Probability Model__
`$$P(Cons = 1|Age, White, Educ, Attend) = \beta_0 + \beta_1 Age + \beta_2 White + \beta_3 Educ + \beta_4 Attend$$`

--
__Logit Model__
`$$P(Cons = 1|Age, White, Educ, Attend) = \Lambda(\beta_0 + \beta_1 Age + \beta_2 White + \beta_3 Educ + \beta_4 Attend)$$`

--
__Probit Model__
`$$P(Cons = 1|Age, White, Educ, Attend) = \Phi(\beta_0 + \beta_1 Age + \beta_2 White + \beta_3 Educ + \beta_4 Attend)$$`

---
### Data and Measures

We can explore this question using data from the General Social Survey (GSS) from 2012 with the following measurements:

* `$Cons$` - a binary variable indicating that a respondent identifies as Conservative or Extremely Conservative
* `$Age$` - age in years (continuous)
* `$White$` - a binary variable indicating the respondent identifies as white
* `$Educ$` - education in years (continuous)
* `$Attend$` - a binary variable indicating the respondent identifies as attending religious services "Nearly Every Week" or more often

---
### Estimation

<table class="texreg" style="margin: 10px auto;border-collapse: collapse;border-spacing: 0px;caption-side: bottom;color: #000000;border-top: 2px solid #000000;">
<caption>Statistical models</caption>
<thead>
<tr>
<th style="padding-left: 5px;padding-right: 5px;">&nbsp;</th>
<th style="padding-left: 5px;padding-right: 5px;">LPM</th>
<th style="padding-left: 5px;padding-right: 5px;">Logit</th>
<th style="padding-left: 5px;padding-right: 5px;">Probit</th>
</tr>
</thead>
<tbody>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">Intercept</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.00</td>
<td style="padding-left: 5px;padding-right: 5px;">-2.95<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">-1.66<sup>***</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.06)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.42)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.23)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Age (yrs.)</td>
<td style="padding-left: 5px;padding-right: 5px;">0.00<sup>**</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.01<sup>**</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.01<sup>**</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.00)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.00)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.00)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">White</td>
<td style="padding-left: 5px;padding-right: 5px;">0.11<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.88<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.47<sup>***</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.02)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.20)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.11)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Educ (yrs.)</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.00</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.01</td>
<td style="padding-left: 5px;padding-right: 5px;">-0.01</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.00)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.02)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.01)</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Attend</td>
<td style="padding-left: 5px;padding-right: 5px;">0.15<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.95<sup>***</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.54<sup>***</sup></td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.02)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.13)</td>
<td style="padding-left: 5px;padding-right: 5px;">(0.07)</td>
</tr>
<tr style="border-top: 1px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">R<sup>2</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.05</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Adj. R<sup>2</sup></td>
<td style="padding-left: 5px;padding-right: 5px;">0.05</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Num. obs.</td>
<td style="padding-left: 5px;padding-right: 5px;">1772</td>
<td style="padding-left: 5px;padding-right: 5px;">1772</td>
<td style="padding-left: 5px;padding-right: 5px;">1772</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">AIC</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">1622.80</td>
<td style="padding-left: 5px;padding-right: 5px;">1622.72</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">BIC</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">1650.20</td>
<td style="padding-left: 5px;padding-right: 5px;">1650.12</td>
</tr>
<tr>
<td style="padding-left: 5px;padding-right: 5px;">Log Likelihood</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">-806.40</td>
<td style="padding-left: 5px;padding-right: 5px;">-806.36</td>
</tr>
<tr style="border-bottom: 2px solid #000000;">
<td style="padding-left: 5px;padding-right: 5px;">Deviance</td>
<td style="padding-left: 5px;padding-right: 5px;">&nbsp;</td>
<td style="padding-left: 5px;padding-right: 5px;">1612.80</td>
<td style="padding-left: 5px;padding-right: 5px;">1612.72</td>
</tr>
</tbody>
<tfoot>
<tr>
<td style="font-size: 0.8em;" colspan="4"><sup>***</sup>p &lt; 0.001; <sup>**</sup>p &lt; 0.01; <sup>*</sup>p &lt; 0.05</td>
</tr>
</tfoot>
</table>

---
### Goodness of Fit
Although Stata will report the pseudo `$R^2$` value, let's calculate it for the Logit and Probit Models.

`$$\bar{R^2} = 1 - \frac{LL_{mod}}{LL_{0}}$$`

--
__Logit__

* Identify the log likelihood from the full model, `$LL_{mod}$`
* Estimate an intercept only model, and find the log likelihood, `$LL_0$`

`$$\bar{R^2} = 1 - \frac{-806.4}{-937.6}$$`
`$$\bar{R^2} = 0.14$$`

--
__Probit__

* Identify the log likelihood from the full model, `$LL_{mod}$`
* Estimate an intercept only model, and find the log likelihood, `$LL_0$`

`$$\bar{R^2} = 0.14$$`

---
### Hypothesis Testing - Single Parameter

Let's now turn to looking at the statistical significance of our coefficients (let's use the probit model for now) and test, e.g., the following hypothesis:
`$$H_0: \beta_{Attend} = 0$$`

--
Let's test our hypothesis at the significance level where `$\alpha = 0.05$` (or 95%).

--
Next, let's calculate our t-test statistic

a) Find the test-statistic 
`$$W = \frac{\hat{\beta_j} - \beta_0}{\hat{se}(\hat{\beta_j})} \sim N(0,1)$$`

--
`$$W = \frac{0.54 - 0}{0.07} = 7.71$$`

--
and compare to the critical value (taken from Normal, p = 0.975)
`$$7.71 > 1.96$$`

---
### Hypothesis Testing - Single Parameter

b) Find the p-value

The p-value associated with 7.71 is: `$<0.001$`

--
c) Find the confidence interval:

`$$CI: \hat{\beta_j} \pm c*se(\hat{\beta_j})$$`

--
`$$CI: 0.54 \pm 1.96*0.07$$`

--
`$$CI: [0.4,.68]$$`

--
Using each of these (redundant) metrics, we __reject the null hypothesis__ and find that attend is statistically significant (i.e. distinguishable from 0).