Homework #7 Solutions

What are the names of the people in your study group?
Each person in the group should find 2 academic articles related to your current choice of final project. Write a short paragraph on each, concentrating on what data is used (and whether it is accessible), what econometric techniques, and what questions are addressed. Answers will vary.
I used a particular subsample of the BRFSS data (no need for you to do it yourself, not yet) to estimate a logit model where the dependent variable is whether the person’s BMI would classify them as “overweight” or “obese”. I include a quadratic in age with gender interaction. These results are:

	Coefficient	Std Error	t stat
Constant	-2.355	0.430	-5.48
Age	0.197	0.008	24.86
Age^2	-0.0020	0.0001	-21.31
Female	1.360	0.206	6.61
Age*Female	-0.100	0.010	-9.66
Age^2*Female	0.0011	0.0001	8.94

Are these coefficient estimates each statistically significant? Calculate t-statistics and p-values for each (there are again 105409 df).
What is the predicted probability of being overweight for a 35-year-old male? For a female of the same age? Men: Z = -2.355 + .197*35 - .002*35^2; then predicted probability is 1/(1+e(-Z)) so .886 Women: Z = (-2.355+1.36) + (.197-.1)*35 - (.002 - .0011)*35^2; then predicted probability is 1/(1+e(-Z)) so .78
At what age does male probability of being overweight peak? Female? At what levels for each?

Peaks at age 49.3 for men and 53.9 for women, when they are respectively 91.7% and 82.3% likely to be overweight.

Next, download the BRFSS data and do some of your own estimations. BMI is a person’s weight in kg divided by their squared height in m, so a number over 25 is interpreted as overweight. The data includes a continuous variable (BMI_measure), a 0/1 dummy for overweight (d_overweight; includes overweight and obese), and a 4-category classification (X_BMI5CAT). Compare results.

Start with some basic statistics: how does the tendency to overweight vary among educational groups? Are these differences statistically significant? Compare results from each of the 3 classifications above. Discuss.
Next estimate a linear model of the continuous measure. Explain what variables are important to include? Discuss the results.
Next estimate a logit and probit model of the dummy 0/1 measure. Explain what variables ought to be included or excluded. Discuss the results of the model.
Check if you use the 4-category variable and only look at a logit or probit model of whether person is obese (BMI over 30) – how do the results change? Discuss each model and to what extent the different specifications give variant results.

Answers will vary.