---
title: 'Homework #5 possible solutions'
output: html_document
font-family: 'Corbel'
---
###Econ B2000, MA Econometrics
###Kevin R Foster, CCNY
1. What are the names of the people in your study group?
2. Using the CPS data, construct some interesting regressions on wage and salary (you might use the same subgroup as I did or you might change it up). Estimate a linear, quadratic, cubic and quartic specification of age on log wage. Don't just give me raw output! Make a nice table, like stargazer or in Stock and Watson (e.g. Chapter 9, Table 9.2). Make nice graphs and tests of groups of coefficients like I showed in class. Explain your regressions and what you learn.
3. Look at averages by educational attainment and discuss how these relate to the results from your previous regression. What does OLS contribute; how does it change the simple results from differences in means?
4. Consider the following table of numbers of people (from CPS data) who make under or over $15/hr in wage - a level that some politicians want to set as the new minimum wage. (This is a particular subset, don't bother trying to replicate, the numbers given here should be sufficient.)
number | Native | Immigrant | Native | Immigrant
--------------- | ------ | --------- | ------ | ---------
Educ HS or more | 14235 | 3113 | 33150 | 5296
no HS diploma | 1062 | 1824 | 662 | 567
a. Is the fraction of immigrants making less than $15/hr different from the fraction making more than $15/hr? In a statistical test of the difference, what is t-statistic, p-value, and confidence interval?
```{r echo=FALSE}
Nimmig_lt15 <- 3113 + 1824
Nimmig_gt15 <- 5296 + 567
Ntotal <- Nimmig_gt15 + Nimmig_lt15
p_lt15 <- Nimmig_lt15/Ntotal
se_p1 <- sqrt((p_lt15*(1 - p_lt15))/Ntotal)
```
There are `r Nimmig_lt15` making less than 15 and `r Nimmig_gt15` making more, so this is `r Nimmig_lt15/Ntotal ` vs `r Nimmig_gt15/Ntotal `. The standard error of the sample proportion is sqrt(p(1-p)/n) so `r se_p1`. The difference of either fraction from 50% is much larger so the t-stat is `r (0.5 - p_lt15)/se_p1` and the p-value is much less than 1%, `r 2*(1-pt((0.5 - p_lt15)/se_p1, df = (Ntotal-1)))`.
b. Is the fraction of people without a HS diploma, making less than $15/hr, different from the fraction making more than $15/hr? In a statistical test of the difference, what is t-statistic, p-value, and confidence interval?
```{r echo=FALSE}
N_ltHS_lt15 <- 1062+1824
N_ltHS_gt15 <- 662+567
N_total <- N_ltHS_gt15 + N_ltHS_lt15
p_ltHS <- N_ltHS_lt15 / N_total
se_p1 <- sqrt((p_ltHS*(1 - p_ltHS))/N_total)
```
There are `r N_ltHS_lt15` with less than HS making less than 15 and `r N_ltHS_gt15` with less than HS making more, so this is `r p_ltHS ` vs `r N_ltHS_gt15/N_total `. The standard error of the sample proportion is sqrt(p(1-p)/n) so `r se_p1`. The difference of either fraction from 50% is much larger so the t-stat is `r (p_ltHS - 0.5)/se_p1` and the p-value is much less than 1%, `r 2*(1-pt((p_ltHS -0.5)/se_p1, df = (N_total-1)))`.
c. In the population of people making less than $15/hr, what fraction are immigrants without a HS diploma?
The fraction is `r 1824/(14235 + 3113 + 1062)`.
d. What is the conditional probability of finding an immigrant without a HS degree, given that the person is an immigrant and is making less than $15/hr?
The conditional probability is `r 1824/(1824 + 3113)`.
e. What is the conditional probability of finding an immigrant without a HS degree, given that the person is an immigrant and is making more than $15/hr?
The conditional probability is `r 567/(567 + 5296)`.