Homework #3

Kevin R Foster, CCNY

Each student should submit a separate assignment, even if it is an identical computer file to the rest of your study group. When submitting assignments, please include your name and the assignment number as part of the filename. Please write the names of your study group members at the beginning of your homework.

What are the names of the people in your study group?
Consider the PUMS data for people in NY, that we’ve been using in class. For now restrict attention to just working people (explain how you might define that).

Do a statistical test of the difference in average age between working people in the Bronx vs working people in Brooklyn. What is the 95% confidence interval for the difference in means?
What if you were using the Age data but regularized so that the min is zero and max is one [recall my function, (X_in-min(X_in,na.rm = TRUE))/abs(max(X_in,na.rm = TRUE)-min(X_in, na.rm =TRUE)) ]. Would the statistical test come out the same? Why or why not?

I used the PUMS data to look at wages and commute type, getting this table for people in the City: (you can answer parts a-c without R)

w	bus	car	subway
Wage below $25,000	1501	2394	3704
Wage above $75,000	385	1825	2194

Given that someone takes the bus to work, what is the probability that they’re making wages above $75,000?
Given that someone takes the subway to work, what is the probability that they make wages below $25,000?
Given that someone has wage above $75,000, what is the probability that they drive a car to work?
Using the PUMS data, can you narrow this further - what are the socioeconomics of bus/subway in the various boroughs? What is the wealthiest PUMA area and how do the people living there tend to commute? Can you find interesting patterns?

Try the machine learning K-nearest-neighbor algorithm on the PUMS data to get another view of the “interesting pattern” from above. (As usual, step one is to replicate my code, then gradually morph it into your own.) How good of a classification can you get?

Homework #3

Due 8am EST Wednesday Sept 27, 2017

Econ B2000, MA Econometrics

Kevin R Foster, CCNY