Homework #3

Kevin R Foster, CCNY

Each student should submit a separate assignment, even if it is an identical computer file to the rest of your study group. When submitting assignments, please include your name and the assignment number as part of the filename. Please write the names of your study group members at the beginning of your homework.

What are the names of the people in your study group?
Using the PUMS data, consider some statistical tests:

What is the average age of people in Brooklyn? In Queens? Is there a statisitcally signficant difference?
Create confidence intervals for each, as well as the difference. Explain.
Did you snip off the top-coded people? Re-do the test without those people. How does the p-value change?
Now supposed you normalized all of the ages to the [0,1] interval, as with this function,

norm_varb <- function(X_in) { (X_in-min(X_in,na.rm = TRUE))/abs(max(X_in,na.rm = TRUE)-min(X_in, na.rm =TRUE)) }*

Is there a statistically significant difference? How does the p-value change? Explain how you dealt with the top-coding.

Based on your knowledge of those boroughs, can you explain the results? Can you break out the differences if you used age ranges? What are the fractions of children in each borough? Older people? Are these statistically significant?
Going more granular, can you look at all of these differences by neighborhood within each borough? At what point does this get into p-hacking?
What would be a good way to show all of these differences graphically?

I used the PUMS data to look at wages and commute type, getting this table for people in the City: (you can answer parts a-c without R)

w	bus	car	subway
Wage below $25,000	1501	2394	3704
Wage above $75,000	385	1825	2194

Given that someone takes the bus to work, what is the probability that they’re making wages above $75,000?
Given that someone takes the subway to work, what is the probability that they make wages below $25,000?
Given that someone has wage above $75,000, what is the probability that they drive a car to work?
Using the PUMS data, can you narrow this further - what are the socioeconomics of bus/subway in the various boroughs? What is the wealthiest PUMA area and how do the people living there tend to commute? Can you find interesting patterns?
Try the machine learning K-nearest-neighbor algorithm on the PUMS data to get another view of the commuting pattern from above. How good of a classification of commute type can you get? Explain what you believe are important variables in this classification. You might explore the “caret” function.

Homework #3

Due 8am EST Wednesday Oct 3, 2018

Econ B2000, MA Econometrics

Kevin R Foster, CCNY