Possible Solutions for Exam 1

K Foster, Statistics and Introduction to Econometrics, Eco B2000, CCNY, Fall 2011

 

 

 

 

1.       (15 points) You might find it useful to sketch the distributions.

a.      For a Standard Normal Distribution, what is the area closer to the mean than 1.45? Area to the left of -1.45 is .0735 so the area in both tails farther than 1.45 is .1471; subtract this from one to find the area in the middle, 0.8529.

b.      For a Standard Normal Distribution, what is the area to the right of 2?  0.0228

c.       For a Normal Distribution with mean  5 and standard deviation 7.6, what is area to the right of 14.1?  This is equivalent to the area to the right of the standardized value  = 1.197, which is .1156.

d.      For a Normal Distribution with mean  1 and standard deviation 7.8, what is area in both tails farther from the mean than 11?  Equivalent to area for standard normal farther from zero than  = 1.2821 so 2*0.0999 = .1998.

e.      For a Normal Distribution with mean -5 and standard deviation 1.6, what is area in both tails farther from the mean than -2.6?  Equivalent to  = 1.5, so 0.1336.

f.        For a Normal Distribution with mean -1 and standard deviation 9.8, what values leave probability 0.157 in both tails?  A standardized value of ±1.4152 leaves .0785 probability in each tail (so .157 in both); this standardized value is equivalent to (-14.87, 12.87).

 

2.      (15 points) In a medical study, people were randomly assigned to use either antibacterial products or regular soap.  In total 592 people used antibacterial soap; 586 used regular soap.  It was found that 33.1% of people using antibacterial products got a cold; 32.3% of people using regular soap got colds.

a.      Test the null hypothesis that there is no difference in the rates of sickness for people using regular or antibacterial soap. (What is the p-value?)   The standard error for the difference uses the variance of the first, which is 0.331(1 - .331) = .2214 and the variance of the second, .323(1 - .323) = .2187, so the standard error of the difference is  =0.0273.  The difference is 0.2927 of a standard deviation, so that there is a p-value of 0.7698 – there is almost a 77% chance that, if there were actually no difference we might still measure such a tiny difference.

b.      Create a 95% confidence interval for the difference in sickness rates.  What is the 90% confidence interval?  The 99% interval?  A 95% confidence interval for the difference is 0.008 ± 1.96*.027 = .008 ± .054 = (-.046,.062).  A 90% confidence interval uses 1.64 instead of 1.96 so (-.037, .053); a 99% confidence interval uses 2.58 so (-.063,.079).

c.       Every other study has found similar results.  Why do you think people would pay more for antibacterial soaps? Answers will vary.  Many students missed the basic finding, that people washing with antibacterial soap got sick MORE often!

 

3.       (15 points) A study of workers and managers asked both how much management listened to workers' suggestions (on a scale of 1-7 where "1" indicates that they paid great attention).  Managers averaged a 2.50 (standard deviation of 0.55); workers answered an average 2.08 (standard deviation of 0.76) – managers ignore their workers even more often than the employees realize.  There were 137 workers and 14 managers answering.

a.      Test the null hypothesis that there was no difference between workers and managers: how likely is it that there is actually no difference in average response? (What is the p-value?)  The difference is 2.50 – 2.08 = 0.42.  The standard error of the difference is  0.16.  So the difference is .42/.16 = 2.61 standard errors from zero, so there is a small chance that this is just random; the chance is under 1%.

b.      Create a 95% confidence interval for the difference between workers and managers.  What is the 90% confidence interval?  The 99% interval?  The 95% confidence interval for the difference is 0.42 ± 1.96*.16 = .42± .31 = (.11,.73).  The 90% CI uses 1.64 so (.16,.68); the 99% CI uses 2.56 so (.01,.83).

 

4.      (15 points) A recent survey by Intel showed that 53% of parents (561 were surveyed) were uncomfortable talking with their children about math & science.  Previous surveys found that 57% of parents talked with their kids about sex & drugs. 

a.      Test the null hypothesis that parents are as comfortable talking about math & science as sex & drugs; that the true value of parents uncomfortable with math and science is not different from 57%.  What is the p-value? The standard error of the survey is   = .021, so the difference of 0.04 is 1.9 standard errors from zero.  The p-value is .058.

b.      Create a 95% confidence interval for the true fraction of parents who are uncomfortable with math & science.  What is the 90% confidence interval?  The 99% interval?  The 95% CI is .04 ± 1.96*.021 = (-.001,.081); 90% is (.005,.075); 99% is (-.014,.094).

 

5.      (15 points) The New York Times reported on educational companies that over-sell their products and gave the example of "Cognitive Tutor" (CT) that helps math students.  The CT students improved by 17.41 (standard error of 5.82); the regular students improved by 15.28 (standard error of 5.33).  There were 153 students in the new program and 102 regular students.

a.      Test the null hypothesis that there is no difference between regular students and those in the CT group.  What is the p-value for this difference? The difference is (17.41 – 15.28) = 2.13.  The standard error of the difference is  =.707, so the difference is just over 3 standard errors from zero.  The p-value is .003.

b.      Create a 95% confidence interval for the difference between regular and CT students.  What is the 90% confidence interval?  The 99% interval? The CIs are 90: (.97,3.29), 95:(.74,3.52), 99:(.32,3.94).

 

6.      (20 points) Use the ATUS data (available from Blackboard) on the time that people spend in different activities.

a.      Among households with kids, what is the average time spent on activities related to kids? 

There are 47, 298 people in households with kids, spending an average of 66.74 minutes, with standard deviation of 99.98 minutes.

b.      Among households with kids, how much time to men and women spend on activities related to kids?  Form a hypothesis test for whether there is a statistically significant difference between the time that men and women spend with kids.  What is the p-value for the hypothesis of no difference?  What is a 95% confidence interval for the difference in time?

There are 19,945 men and 27,353 women.  The males average 41.05 minutes with kids (std dev 77.45); the women average 85.46 minutes (std dev 109.90).  The difference is 85.46 – 41.05 = 44.41; the standard error of this difference is 0.86 so the difference is huge and p-value about zero!  A 95% Confidence interval for the difference is 44.41 ± 0.86*1.96 = 44.41 ± 1.69 = (42.72, 46.10).

c.       Why do you think that we would find these results?  Explain (perhaps with some further empirical results from the same data set).

Answers will vary.

 

7.       (25 points) Use the PUMS data (available from Blackboard) on the residents of NYC.  Consider the time (in minutes) spent by people to travel to work; this variaoble has name JWMNP.

a.      How many men and women answered this question?  What variables do you think would be relevant, in trying to explain the variation in commuting times?

There are 66,256 men and 66,087 women answering; their answers are very close – 39.34 for men and 39.44 for women.  Answers will vary for which are the relevant variables.

b.      Form a linear regression with the dependent variable, "JWMNP Travel Time to Work," and relevant independent variables.

Answers will vary.  Example is below.

c.       Which independent variables have coefficients that are statistically significantly different from zero?

Answers will vary.  Example is below.

 

Here is the SPSS output for a simple linear regression, explaining time commuting.  All of the variables are statistically significantly different from zero except for Native American, Hispanic, kids under 6,  all of the educational measures, and all of the poverty measures.

 

 

 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

.437a

.191

.190

23.223

a. Predictors: (Constant), commute_subway, if family income less than 200 percent of poverty line, nativeamerican, educ_collassoc, asianamerican, female, boro_bk, educ_somecoll, kids_under6, boro_si, raceother, Age, educ_hs, commute_bus, boro_bx, foreign_born, educ_adv, africanamerican, if family income less than 100 percent of poverty line, kids_under17, Hispanic, boro_qns, commute_car, educ_coll, if family income less than 150 percent of poverty line

 

ANOVAb

Model

Sum of Squares

df

Mean Square

F

Sig.

1

Regression

9135976.012

25

365439.040

677.593

.000a

Residual

3.878E7

71905

539.319

 

 

Total

4.792E7

71930

 

 

 

a. Predictors: (Constant), commute_subway, if family income less than 200 percent of poverty line, nativeamerican, educ_collassoc, asianamerican, female, boro_bk, educ_somecoll, kids_under6, boro_si, raceother, Age, educ_hs, commute_bus, boro_bx, foreign_born, educ_adv, africanamerican, if family income less than 100 percent of poverty line, kids_under17, Hispanic, boro_qns, commute_car, educ_coll, if family income less than 150 percent of poverty line

b. Dependent Variable: Travel time to work

 

 

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

t

Sig.

B

Std. Error

Beta

1

(Constant)

11.611

.557

 

20.844

.000

Age

.116

.007

.057

15.637

.000

female

-.795

.180

-.015

-4.422

.000

africanamerican

5.539

.244

.089

22.667

.000

nativeamerican

2.137

1.527

.005

1.399

.162

asianamerican

2.800

.308

.035

9.098

.000

raceother

2.226

.344

.028

6.464

.000

Hispanic

.037

.290

.001

.127

.899

kids_under6

.419

.298

.006

1.404

.160

kids_under17

.729

.232

.014

3.146

.002

foreign_born

1.810

.200

.035

9.062

.000

educ_hs

-.564

.331

-.009

-1.704

.088

educ_somecoll

.416

.355

.006

1.174

.240

educ_collassoc

.681

.430

.007

1.581

.114

educ_coll

.585

.346

.010

1.692

.091

educ_adv

-.519

.359

-.008

-1.446

.148

boro_bx

9.924

.328

.131

30.244

.000

boro_si

20.007

.431

.181

46.468

.000

boro_bk

9.493

.258

.167

36.844

.000

boro_qns

12.302

.270

.212

45.582

.000

if family income less than 100 percent of poverty line

-.014

.475

.000

-.030

.976

if family income less than 150 percent of poverty line

-.613

.481

-.008

-1.273

.203

if family income less than 200 percent of poverty line

-.406

.360

-.006

-1.126

.260

commute_car

1.023

.286

.018

3.581

.000

commute_bus

19.409

.337

.248

57.555

.000

commute_subway

20.466

.261

.390

78.546

.000

a. Dependent Variable: Travel time to work