1. What are the names of the people in your study group?

  2. Each person in the group should find 2 academic articles related to your current choice of final project. Write a short paragraph on each, concentrating on what data is used (and whether it is accessible), what econometric techniques, and what questions are addressed. Answers will vary.

  3. I used a particular subsample of the BRFSS data (no need for you to do it yourself, not yet) to estimate a logit model where the dependent variable is whether the person’s BMI would classify them as “overweight” or “obese”. I include a quadratic in age with gender interaction. These results are:

Coefficient Std Error t stat p val
Constant -2.355 0.430 -5.48 0.000
Age 0.197 0.008 24.86 0.000
Age^2 -0.0020 0.0001 -21.31 0.000
Female 1.360 0.206 6.61 0.000
Age*Female -0.100 0.010 -9.66 0.000
Age^2*Female 0.0011 0.0001 8.94 0.000
  1. Are these coefficient estimates each statistically significant? Calculate t-statistics and p-values for each (there are again 105409 df).
  2. What is the predicted probability of being overweight for a 35-year-old male? For a female of the same age? Men: Z = -2.355 + .197*35 - .002*35^2; then predicted probability is 1/(1+e(-Z)) so .886 Women: Z = (-2.355+1.36) + (.197-.1)*35 - (.002 - .0011)*35^2; then predicted probability is 1/(1+e(-Z)) so .78

  3. At what age does male probability of being overweight peak? Female? At what levels for each?

Peaks at age 49.3 for men and 53.9 for women, when they are respectively 91.7% and 82.3% likely to be overweight.

  1. Next, download the BRFSS data and do some of your own estimations. BMI is a person’s weight in kg divided by their squared height in m, so a number over 25 is interpreted as overweight. The data includes a continuous variable (BMI_measure), a 0/1 dummy for overweight (d_overweight; includes overweight and obese), and a 4-category classification (X_BMI5CAT). Compare results.
  1. Start with some basic statistics: how does the tendency to overweight vary among educational groups? Are these differences statistically significant? Compare results from each of the 3 classifications above. Discuss.
  2. Next estimate a linear model of the continuous measure. Explain what variables are important to include? Discuss the results.
  3. Next estimate a logit and probit model of the dummy 0/1 measure. Explain what variables ought to be included or excluded. Discuss the results of the model.
  4. Check if you use the 4-category variable and only look at a logit or probit model of whether person is obese (BMI over 30) – how do the results change? Discuss each model and to what extent the different specifications give variant results.

Answers will vary.