This set has answers
Consider a normal pdf with mean of 3 and standard deviation of 4. * Find the area under the normal pdf between 3 and 7. * Find the area under the normal pdf between 7 and 11. * What is the probability of finding a value as far away from the mean as 7 if it truly has a normal distribution?
A random variable is distributed with mean of 8 and standard deviation of 4.
What is the probability, if the true distribution is a Standard Normal, of seeing a deviation as large (in abs0lute value) as 1.2?*
What is the probability, if the true distribution has mean 0.5 and standard deviation of 0.3, of seeing a deviation as large (in absolute value) as zero?
Consider an election with generic candidates X vs Y (which we could interpret as chromosomes but not necessarily!).
Suppose that a particular medical treatment already improves patient outcomes by 20 (don’t worry about the units for now) and it is established that the standard deviation for the population is 8. There is an improved treatment that is expected to deliver a further 10% improvement.
Consider the following table of numbers of people (from CPS data) who make under or over $15/hr in wage - a level that some politicians want to set as the new minimum wage. (This is a particular subset, don’t bother trying to replicate, the numbers given here should be sufficient.)
Wage less than $15/hr | Wage greater than $15/hr | |||
---|---|---|---|---|
Native | Immigrant | Native | Immigrant | |
Educ HS or more | 14235 | 3113 | 33150 | 5296 |
no HS diploma | 1062 | 1824 | 662 | 567 |
We consider the co-movement of unemployment and industrial production, comparing the change in the unemployment rate (UR) for months when industrial production (IP) rose against the change in the unemployment rate when industrial production fell, for months since 1975. (Note that UR is measured in percentage points, so a change of .01 means UR went from 5% to 5.01%.)
mean change in UR | std. dev. of UR change | N | |
---|---|---|---|
months when IP rose | -.0498 | .1513 | 325 |
months when IP fell | .0815 | .1836 | 162 |
mean change in UR | std. dev. of UR change | N | |
---|---|---|---|
months when IP rose | -.0349 | .1431 | 112 |
months when IP fell | .0667 | .1840 | 75 |
library(quantmod)
getSymbols(c('INDPRO','UNRATE'),src='FRED')
ip_1 <- INDPRO["1965::"]
ur_1 <- UNRATE["1965::"]
d_ip <- na.trim(ip_1 - lag(ip_1))
d_ur <- na.trim(ur_1 - lag(ur_1))
One of the first notes in class emphasized “know your data.” You’ve done a few homework assignments using Consumer Expenditure Survey data, tell me about that data. How do they calculate expenditure on food away from home?
A recent research paper, looking at how much attractiveness and personal grooming affects wages, used data from The National Longitudinal Study of Adolescent Health in 2001-2.
4 Very Attractive | 3 Attractive | 2 Average | 1 Less Attractive | |
---|---|---|---|---|
4 Very well groomed | 297 | 199 | 57 | 30 |
3 Well groomed | 290 | 1169 | 607 | 54 |
2 Average grooming | 75 | 788 | 2013 | 167 |
1 Less than average grooming | 1 | 25 | 164 | 138 |
4 Very Attractive | 3 Attractive | 2 Average | 1 Less Attractive | |
---|---|---|---|---|
4 Very well groomed | 326 | 171 | 60 | 26 |
3 Well groomed | 416 | 1186 | 467 | 51 |
2 Average grooming | 212 | 966 | 1729 | 136 |
1 Less than average grooming | 11 | 49 | 184 | 84 |
PK Robins, JF Homer, MT French (2011). “Beauty and the Labor Market: Accounting for the Additional Effects of Personality and Grooming,”" Labour, 25(2), pp 228-251.
You know that a random variable has a normal distribution with standard deviation of 16. After 10 draws, the average is -12.
You know that a random variable has a normal distribution with standard deviation of 25. After 10 draws, the average is -10. a. What is the standard error of the average estimate? b. If the true mean were -10, what is the probability that we could observe a value between -10.5 and -9.5
I tracked down this reference from a sign on the bus, from Tobacco Free NY. A survey of 1681 adolescents (age 11-14) in California asked if they had tried smoking and how often they went to convenience, liquor, or small grocery stores. The study finds that 452 kids rarely went to these stores and 81 had tried cigarettes; 458 kids visited these stores often (more than twice a week) and 133 had tried cigarettes. The authors assert that visiting these stores exposed the kids to more tobacco advertising.
Hendrick, L, N C Schleicher, E C Feighery, and S P Fortmann, (2010). “Longitudinal Study of Exposure to Retail Cigarette Advertising and Smoking Initiation,” Pediatrics.
You might have missed this in the news about Alice Munro winning the Nobel, but there was a study done, showing that reading literature such as Munro and Chekov tended to make people score higher on psychological tests of Affective Theory of Mind. Consider the difference between two groups of people: either they read from a selection of literary fiction or they read non-fiction articles about non-human subjects (e.g. potatoes). They were all given a test to determine how well they could identify emotion from a picture of a person’s eyes. (I’m making up some of these numbers.) The Fiction group tests at 25.6 with standard deviation of 4.38; the Non-fiction group tests at 23.5 with standard deviation of 5.17. There were 41 people in the first group and 45 in the second group.
Kidd, D C, and E Castano, (2013). “Reading Literary Fiction Improves Theory of Mind,” Science.
You are comparing two groups: the first has X=0 and Y=1 and Y=3; the second has X=10 and Y=9 and Y=7. [So there are four data points: (0,1), (0,3), (10,9), (10,7).]
A (joking) study in the New England Journal of Medicine linked a country’s per capita consumption of chocolate with the number of Nobel Prizes. It reported a regression coefficient but I got the data and did my own analysis. Five countries with the highest consumption of chocolate (UK, Belgium, Germany, Norway, Switzerland) had 19.02 Nobel Prizes per 10m people (std dev 9.0); the next five countries (USA, Finland, Denmark, Austria, Ireland) with lower chocolate consumption had 16.13 prizes (std dev 8.1).
A (not joking) report from Morgan Stanley reported that the bank’s positions were 23% safer because the bank chose to measure risk with the standard deviation of stock returns from the past one year rather than more years as it had done previously. Over the past one year, the average daily returns on the S&P500 (expressed at monthly rate) were 1.7% with a standard deviation of 3.6% (252 observations). For the four years before, the average return was -0.6% with a standard deviation of 6.5% (1008 observations). Test the null hypothesis that the returns for the past year are the same as the returns for the previous four years. What is the standard error of the difference? What is the test statistic? What is the p-value? Discuss. Why might Morgan Stanley have chosen that particular data? (Note that riskiness is a cost so reduces profits.) The Excel file with this data is on Blackboard although you do not need to use it.
A survey from eFinancialCareers found that, despite predictions from NY State that the Wall St bonus pool would drop by about 35%, a full 48% of the 911 respondents believed that their own bonuses would rise.
Dan Ariely and co-authors report a study that asks participants to solve complicated addition tasks but gives them an opportunity to cheat: they self-report how many problems they correctly solve. Every participant got a pair of fashion sunglasses but some were told that the sunglasses were counterfeit. Forty-two people were told they got counterfeit sunglasses and 30 of them cheated; 43 people were told that they got authentic sunglasses and 13 of them cheated.
F Gino, M I Norton, D Ariely (2010). “The Counterfeit Self: the Deceptive Costs of Faking It,” Psychological Science 21:712.
An audit study emailed professors to ask for an appointment but the names of the ‘students’ were randomly varied to be typically male or female; white, African-American, Hispanic, Chinese, or Indian. White men were 26% more likely to get an appointment than minority women. Suppose you wanted to do a replication study for CUNY faculty. The original study emailed 6500 professors, you would like to study fewer.
K L Mikman, M Akinola, D Chugh, 2012. “Temporal Distance and Discrimination: An Audit Study in Academia,” Psychological Science 23:7.
In recent news a study of adolescent girls compared those who had received a vaccination against HPV (a sexually transmitted virus that is linked to certain cancers) with those who had not received the vaccine. Some parents had been reluctant to get their children vaccinated because they believed this would encourage sexual activity. The study compared 493 who got the vaccine agains 905 who did not. Of the girls who got the vaccine, 61 got any of testing, diagnosis or counseling for pregnancy/sexuallytransmitted disease; of those who did not get the vaccine, 76 got testing, diagnosis, or counseling.
R A Bednarczyk, R Davis, K Ault, W Orenstein, S B Omer (2012). “Sexual Activity-Related Outcomes After Human Papillomavirus Vaccination of 11- to 12-Year-Olds,” Pediatrics.
In a medical study (reference below), people were randomly assigned to use either antibacterial products or regular soap. In total 592 people used antibacterial soap; 586 used regular soap. It was found that 33.1% of people using antibacterial products got a cold; 32.3% of people using regular soap got colds.
E.L.Larson, S.X. Lin, C. Gomez-Pichardo, P. Della-Latta, (2004). “Effect of Antibacterial Home Cleaning and Handwashing Products on Infectious Disease Symptoms: A Randomized Double-Blind Trial,” Ann Intern Med, 140(5), 321-329.
A study of workers and managers asked both how much management listened to workers’ suggestions (on a scale of 1-7 where “1” indicates that they paid great attention). Managers averaged a 2.50 (standard deviation of 0.55); workers answered an average 2.08 (standard deviation of 0.76) - managers ignore their workers even more often than the employees realize. There were 137workers and 14 managers answering.
A recent survey by Intel showed that 53% of parents (561 were surveyed) were uncomfortable talking with their children about math & science. Previous surveys found that 57% of parents talked with their kids about sex & drugs.
The New York Times reported on educational companies that over-sell their products and gave the example of “Cognitive Tutor” (CT) that helps math students. The CT students improved by 17.41 (standard deviation of 5.82); the regular students improved by 15.28 (standard deviation of 5.33). There were 153 students in the new program and 102 regular students.
You are in charge of polling for a political campaign. You have commissioned a poll of 300 likely voters. Since voters are divided into three distinct geographical groups, the poll is subdivided into three groups with 100 people each. The poll results are as follows:
total | A | B | C | |
---|---|---|---|---|
number in favor of candidate | 170 | 58 | 57 | 55 |
number total | 300 | 100 | 100 | 100 |
std. dev. of poll | 0.4956 | 0.4936 | 0.4951 | 0.4975 |
Note that the standard deviation of the sample (not the standard error of the average) is given.
Using data from the NHIS, we find the fraction of children who are female, who are Hispanic, and who are African-American, for two separate groups: those with and those without health insurance. Compute tests of whether the differences in the means are significant; explain what the tests tell us. (Note that the numbers in parentheses are the standard deviations.)
with health insurance | without health insurance | |
---|---|---|
female | 0.4905 | 0.4811 |
(0.49994) N=7865 | (0.49990) N=950 | |
Hispanic | 0.2587 | 0.5411 |
(0.43797) N=7865 | (0.49857) N=950 | |
African American | 0.1785 | 0.1516 |
(0.38297) N=7865 | (0.35880) N=950 |
A paper by Chiappori, Levitt, and Groseclose (2002) looked at the strategies of penalty kickers and goalies in soccer. Because of the speed of the play, the kicker and goalie must make their decisions simultaneously (a Nash equilibrium in mixed strategies). For example, if the goalie moves to the left when the kick also goes to the left, the kick scores 63.2% of the time; if the goalie goes left while the kick goes right, then the kick scores 89.5% of the time. In the sample there were 117 occurrences when both players went to the left and 95 when the goalie went left while the kick went right. What is the p-value for a test that the probability of scoring is different? What advice, if any, would you give to kickers, based on these results? Why or why not?
A paper by Claudia Goldin and Cecelia Rouse (1997) discusses the fraction of men and women who are hired by major orchestras after auditions. Some orchestras had applicants perform from behind a screen (so that the gender of the applicant was unknown) while other orchestras did not use a screen and so were able to see the gender of the applicant. Their data show that, of 445 women who auditioned from behind a screen, a fraction 0.027 were “hired”. Of the 599 women who auditioned without a screen, 0.017 were hired. Assume that these are Bernoulli random variables. Is there a statistically significant difference between the two samples? What is the p-value? Explain the possible significance of this study.
Another paper, by Kristin Butcher and Anne Piehl (1998), compared the rates of institutionalization (in jail, prison, or mental hospitals) among immigrants and natives. In 1990, 7.54% of the institutionalized population (or 20,933 in the sample) were immigrants. The standard error of the fraction of institutionalized immigrants is 0.18. What is a 95% confidence interval for the fraction of the entire population who are immigrants? If you know that 10.63% of the general population at the time are immigrants, what conclusions can be made? Explain.
You are consulting for a polling organization. They want to know how many people they need to sample, when predicting the results of the gubernatorial election.
A recent report asserted that people who worked more hours also tended to be fatter (among those in certain occupations). (The paper doesn’t give precise numbers so I’ll make them up - don’t bother with Google.) The paper did much more econometric analysis of course. Nevertheless, suppose that, of the 7219 women working non-strenuous occupations, 23% are working more than 40 hours/week. Of those women in non-strenuous occupations working more than 40 hours/week, 27.3% were obese; of those women in non-strenuous occupations working less than 40 hours/week, 24.6% were obese. There were also 714 women in strenuous occupations with 21% working more than 40 hours/week. Of the women in strenuous occupations working more than 40 hours/week, 28.1% were obese while 37.4% were obese among those working fewer hours. Does it seem likely that overtime makes certain groups more likely to be obese? J Abramowitz, “Working Hours, Body Mass Index, and Health Status: A Time Use Analysis”
To investigate an hypothesis proposed by a student, I got data, for 102 of the world’s major countries, on the fraction of the population who are religious as well as the income per capita and the enrollment rate of boys and girls in primary school. The hypothesis to be investigated is whether more religious societies tend to hold back women. I ran two separate models: Model 1 uses girls enrollment rate as the dependent; Model 2 uses the ratio of girls to boys enrollment rates as the dependent. The results are below (standard errors in italics and parentheses below each coefficient):
Model 1 | Model 2 | t-stat | p-value | |
---|---|---|---|---|
Intercept | 137 | 1.12 | ___ | ___ |
(18) | (0.09) | ___ | ___ | |
Religiosity | -0.585 | -0.0018 | ___ | ___ |
(0.189) | (0.0009) | ___ | ___ | |
GDP per capita | 0.00056 | 0.0000016 | ___ | ___ |
(0.00015) | (0.0000007) | ___ | ___ |
Peter Gordon, in his talk at CCNY, presented results from linear regressions to explain the growth of metropolitan areas. He begins with a simple model to explain population growth from 1990-2000: ##### Log Population Growth 1990-2000
Coefficient | t-stat | p-value | |
---|---|---|---|
Constant term | -0.0229 | -0.12 | ___ |
Population in 1990 (log) | 0.0192 | 1.33 | ___ |
Pop. Density in 1990 | -0.0504 | -1.65 | ___ |
% in manufacturing | -0.0028 | -1.63 | ___ |
R2 0.57 |
Where he also includes dummy variables for Census Regions (New England, Mid Atlantic, etc.). There are 79 observations and 67 degrees of freedom.
Average | Standard deviation | |
---|---|---|
Population in 1990 (log) | 14.52 | 14.89 |
Pop. Density in 1990 | 1.80 | 1.02 |
% in manufacturing | 18.69 | 7.75 |
Fill in the blanks in the following table showing SPSS regression output. The model has the dependent variable as time spent working at main job. Coefficients(a) Model Unstandardized Coefficients
B | Std. Error | t | Sig. | |
---|---|---|---|---|
1 (Constant) | 198.987 | 7.556 | 26.336 | .000 |
Female | -65.559 | 4.031 | _____ | _____ |
African-American | -9.190 | 6.190 | _____ | _____ |
Hispanic | 17.283 | 6.387 | _____ | _____ |
Asian | 1.157 | 12.137 | _____ | _____ |
Native American/Alaskan Native | -28.354 | 14.018 | -2.023 | .043 |
Education: High School Diploma | _____ | 6.296 | 11.706 | .000 |
Education: Some College | _____ | 6.308 | 14.651 | .000 |
Education: 4-year College Degree | 110.064 | _____ | 16.015 | .000 |
Education: Advanced degree | 126.543 | _____ | 15.714 | .000 |
Age | -1.907 | _____ | -16.428 | .000 |
a Dependent Variable: Time Working at main job
You want to examine the impact of higher crude oil prices on American driving habits during the past oil price spike. A regression of US gasoline purchases on the price of crude oil as well as oil futures gives the coefficients below. Critique the regression and explain whether the necessary basic assumptions hold. Interpret each coefficient; explain its meaning and significance. Coefficients(a) Model Unstandardized Coefficients Standardized Coefficients
Model | B | Std. Error | Beta | t | Sig. |
---|---|---|---|---|---|
1 (Constant) | .252 | .167 | 1.507 | .134 | |
return on crude futures, 1 month ahead | .961 | .099 | .961 | 9.706 | .000 |
return on crude futures, 2 months ahead | -.172 | .369 | -.159 | -.466 | .642 |
return on crude futures, 3 months ahead | .578 | .668 | .509 | .864 | .389 |
return on crude futures, 4 months ahead | -.397 | .403 | -.333 | -.986 | .326 |
US gasoline consumption | -.178 | .117 | -.036 | -1.515 | .132 |
Spot Price Crude Oil Cushing, OK WTI FOB* | 4.23E-005 | .000 | .042 | 1.771 | .079 |
a Dependent Variable: return on crude spot price
* Dollars per barrel
Using the American Time Use Study (ATUS) we measure the amount of time that each person reported that they slept. We run a regression to attempt to determine the important factors, particularly to understand whether richer people sleep more (is sleep a normal or inferior good) and how sleep is affected by labor force participation. The SPSS output is below.
Coefficients(a)
B | Std. Error | t | Sig. | |
---|---|---|---|---|
1 (Constant) | -4.0717 | 4.6121 | -0.883 | 0.377 |
female | 23.6886 | 1.1551 | 20.508 | 0.000 |
African-American | -8.5701 | 1.7136 | -5.001 | 0.000 |
Hispanic | 10.1015 | 1.7763 | 5.687 | 0.000 |
Asian | -1.9768 | 3.3509 | -0.590 | 0.555 |
Native American/Alaskan Native | -3.5777 | 3.8695 | -0.925 | 0.355 |
Education: High School Diploma | 2.5587 | 1.8529 | 1.381 | 0.167 |
Education: Some College | -0.3234 | 1.8760 | -0.172 | 0.863 |
Education: 4-year College Degree | -1.3564 | 2.0997 | -0.646 | 0.518 |
Education: Advanced degree | -3.3303 | 2.4595 | -1.354 | 0.176 |
Weekly Earnings | 0.000003 | 0.000012 | -0.246 | 0.806 |
Number of children under 18 | 2.0776 | 0.5317 | 3.907 | 0.000 |
person is in the labor force | -11.6706 | 1.7120 | -6.817 | 0.000 |
has multiple jobs | 0.4750 | 2.2325 | 0.213 | 0.832 |
works part time | 4.2267 | 1.8135 | 2.331 | 0.020 |
in school | -5.4641 | 2.2993 | -2.376 | 0.017 |
Age | 1.1549 | 0.1974 | 5.850 | 0.000 |
Age-squared | -0.0123 | 0.0020 | -6.181 | 0.000 |
A paper by Farber examined the choices of how many hours a taxidriver would work, depending on a number of variables. His output is: “Driver Effects” are fixed effects for the 21 different drivers.
A study by Mehran and Tracy examined the relationship between stock option grants and measures of the company’s performance. They estimated the following specification: \[ Options = \beta_0+\beta_1(Return\ on\ Assets)+\beta_2(Employment)+\beta_3(Assets)+\beta_4(Loss)+u \] where the variable (Loss) is a dummy variable for whether the firm had negative profits. They estimated the following coefficients:
Coefficient | Standard Error | |
---|---|---|
Return on Assets | -34.4 | 4.7 |
Employment | 3.3 | 15.5 |
Assets | 343.1 | 221.8 |
Loss Dummy | 24.2 | 5.0 |
Which estimate has the highest t-statistic (in absolute value)? Which has the lowest p-value? Show your calculations. How would you explain the estimate on the “Loss” dummy variable?
Below is some SPSS output from a regression from the ATUS. The data encompass only the group of people who report that they spent non-zero time in education-related activities such as going to class or doing homework for class. The regression examines the degree to which education-time crowds out TV-watching time. The dependent is time spent watching TV. The independents are time spent on all Education-related activities as well as the usual demographic variables. Fill in the blanks. Coefficients(a)
B | Std. Error | t | Sig. | |
---|---|---|---|---|
1 (Constant) | 160.531 | 14.658 | 10.952 | .000 |
time spent on Education related activities | -.137 | .023 | ? | ? |
female | -26.604 | 7.852 | ? | ? |
African-American | -4.498 | ? | -.417 | .677 |
Hispanic | ? | 12.181 | -.681 | .496 |
Asian | -7.881 | 19.291 | ? | ? |
Native American/Alaskan Native | -4.335 | 28.633 | -.151 | ? |
Education: High School Diploma | 1.461 | 13.415 | .109 | ? |
Education: Some College | 3.186 | ? | .311 | .756 |
Education: 4-year College Degree | -47.769 | 13.471 | -3.546 | ? |
Education: Advanced degree | ? | 18.212 | -3.379 | .001 |
Age | ? | .276 | 2.839 | .005 |
Weekly earnings [2 implied decimals] | .000 | .000 | -.990 | .322 |
In the Labor Force | -25.210 | 10.794 | ? | .020 |
Has multiple jobs | .918 | 15.299 | ? | .952 |
Works part time | 3.816 | 10.427 | .366 | .714 |
a Dependent Variable: watching TV (not religious)
Using the same SPSS output from the regression above, explain clearly which variables are statistically significant. Provide an interpretation for each of the observed signs. What about the magnitude of the coefficients? What additional variables (that are in the dataset) should be included? What results are surprising to you? (Note your answer should be a well-written few paragraphs, not just terse answers to the above questions.)
A colleague proposes the following fitted line. Explain how or if his model could be an OLS regression. There are 100 observations of pairs of \((x_i,y_i)\) , \(i = 1, ..., 100\) and for simplicity assume \(x_i>0, y_i >0 \ for\ all i = 1,...,100\). For the first 99 observations, the fitted value, \(\hat{y}_i\), is equal to the actual value \(\hat{y}_i = y_i\ for i= 1,...,99\). But for the 100th observation the fitted value missed the true value by 2, so \(y_{100} - \hat{y}_{100} = 2\) If the fitted values do not come from an OLS regression, how should the fitted model be changed?
Consider a simple regression where hours worked are regressed on a dummy for whether the household is in a rural area and the omitted category is that the person is in a more urban area. (This is a particular subsample of the CEX but I’m not asking you to reestimate, you can figure the answer from the information given here; there are 3595 degrees of freedom). You get the following regression results.
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | _____ | 0.1422 | 311.677 | _____ |
Rural_DummyVariable | 0.7169 | _____ | 1.063 | _____ |
We use the most recent data to assess the relation between changes in GDP and changes in the unemployment rate (so-called Okun’s Law), comparing the relation in the entire period since 1948 with the relation in the period since 1990. Data are from FRED Stats. A regression has the dependent variable as the quarterly change in the unemployment rate (denoted \(\Delta UR\)). The independent variable is the quarterly percent growth rate of nominal GDP (denoted \(\% \Delta Y\)). The estimated regression is \(\Delta UR = \beta_0 + \beta_1\% \Delta Y + u\) .
Using a subsample of the taxi data, I find that on weekends there were 193750 rides paid with credit cards and 187694 rides paid with cash.
ATUS records the numbers of minutes in a typical day that people spend on various activities. ACT_WORK is the number of minutes spent working; I’ll define more than 420 minutes (7 hours) as fulltime.
number | Fulltime | Parttime |
---|---|---|
Educ HS diploma | 8219 | 11879 |
Educ some college | 6412 | 9989 |
Educ Bachelor | 8065 | 12521 |
Consider the CEX data; estimate some models to explain APPARPQ, expenditure on apparel in the previous quarter (includes MENBOYPQ, WOMGRLPQ, and FOOTWRPQ - expenditure on apparel for Men and boys; Women and girl; footwear). How important iseducational attainment on this expenditure category?
Consider the PUMS data for people in NY, that we’ve been using in class. For now restrict attention to just working people (explain how you might define that).
normalize <- function(X_in) {
min_X_in <- min(X_in,na.rm = TRUE)
max_X_in <- max(X_in,na.rm = TRUE)
(X_in - min_X_in) / abs(max_X_in - min_X_in)
}
]. Would the statistical test come out the same? Why or why not?
I used the PUMS data to look at wages and commute type, getting this table for people in the City: (you can answer parts a-c without R)
bus | Car | Subway | |
---|---|---|---|
Wage below $25,000 | 1501 | 2394 | 3704 |
Wage above $75,000 | 385 | 1825 | 2194 |
Use the CEX data that I provided and consider the fraction spent on entertainment, ENTERTPQ/TOTEXPPQ.
Using the ATUS data, describe the time spent working (ACT_WORK). Do people with more education work more or less hours than people with less education? What other factors are important? You should choose a variety of methods (perhaps including comparison of means, linear regression, nearest neighbor) that demonstrate your econometric virtuosity. Carefully specify the statistical tests that you perform, including the null hypothesis and test statistics including t-stat and p-value.
For the ATUS dataset, use “Analyze Descriptive Statistics Crosstabs” to create a joint probability table showing the fractions of males/females about the amount of time spent on the computer vs watching TV (if either or both are above average). Find and interpret the joint probabilities and marginal probabilities. Do this for age groups as well.
I used the CEX data to look at the fraction of spending going to health insurance. I get the following table, grouped by education of the reference person:
%Insurance | No HS | HS diploma | Some college, no degree | Assoc degree | Bach degree | Adv degree |
---|---|---|---|---|---|---|
less than 10% | 467 | 1385 | 1191 | 615 | 1181 | 521 |
11% - 20% | 82 | 231 | 157 | 71 | 122 | 58 |
21% - 30% | 21 | 65 | 27 | 10 | 32 | 7 |
more than 30% | 8 | 18 | 14 | 1 | 3 | 2 |
After the Nobel Prize awards to Fama, Hansen, and Shiller, we look at predictability of stock returns, using data on stocks in the S&P500. There are some days where many of these company’s shares have negative returns; other days where many have positive. In 2012, more than 70% of the companies had positive returns on about 25% of the days; on another 25% of the days fewer than 30% had “up” returns. On the days following “70% up” days, the average return was .06 percent, with standard deviation of 1.72; on days following “30% up,” the average return was .10 percent, with standard deviation of 1.66. There were 65 days of 70% or more up; there were 59 days of 30% or fewer up.
With the NSA spying revelations, we return to questions of whether there is wage discrimination against people with ancestry from the Middle East or North Africa (MENA). I’ve created program in SPSS syntax and R that you can run, which will define MENA_ANC if the person’s ancestry is from MENA (except Israel) or MENA_BPL if the person’s birthplace is MENA. You should consider whether there are differences in wages and incomes between people from the MENA or others; of course one decision to make is who is a relevant comparison group. Calculate averages between groups, considering also things like education; which are statistically significant? Explain in detail.
Use the ATUS data (available from Blackboard) on the time that people spend in different activities.
Using the ATUS dataset that we’ve been using in class, form a comparison of the mean amount of TV time watched by two groups of people (you can define your own groups, based on any of race, ethnicity, gender, age, education, income, or other of your choice).
Use the Fed SCF 2010 data (available from Blackboard). This is the Survey of Consumer Finances, which is not representative (without using the weights, which you need not do for now) - it intentionally oversamples rich people to find out about their finances. Concentrate for now on the variable “SAVING” (about the 100th variable in the list) which is the amount that people have in their savings accounts.
Use the ATUS data (available from Blackboard) on the time that people spend in different activities. Construct a linear regression explaining the time that people spend on enjoyable activities (t_enjoy which includes most of the T12 items). Restrict the data to include only those people spending a non-zero amount of time on such activities.
Use the PUMS data (available from Blackboard) on the residents of NYC. Consider the time (in minutes) spent by people to travel to work; this variable has name JWMNP.
Use the SPSS dataset, atus_tv from Blackboard, which is a subset of the American Time Use survey. This time we want to find out which factors are important in explaining whether people spend time watching TV. There are a wide number of possible factors that influence this choice.
Estimate the following regression:: \(S\&P100 returns = \beta _0 + \beta _1(lag\ S\&P100\ returns) + \beta_2(lag\ interest\ rates) + \epsilon\) using the dataset, financials.sav. Explain which coefficients (if any) are significant and interpret them.
I will consider a simple question of the relation of employment to production - relevant both for questions of “jobless recovery” and worker productivity. In the R dataset, “macro_data1.Rdata
”, I give monthly data for the US on payroll (total nonfarm), the unemployment rate, and an index of industrial production for the period from February 1948 to August 2014. There is also a dummy variable for when the US was in a recession (as defined by NBER). The dataset has both the level of each of these (denoted lvl_
) and log difference (denoted ld_
), where \(ld_z(t) = log(lvl_z(t)) - log(lvl_z(t - 1))\) You can use the command load("macro_data1.RData")
to get the data in. I estimate the following regression for the period from 2000-date: \[
\frac{dPayroll}{Payroll} = 0.000739 + 0.0512 \frac{dProduction}{Production} - 0.00270Recession
\]
The intercept coefficient has standard error of .00011, the slope coefficient on percent change in production has standard error or .0161, and the Recession dummy has standard error of .0003. The R-squared is 0.4943.
Use the CPS dataset (available from Blackboard) to do a regression. Explain why your dependent variable might be caused by your independent variable(s). What additional variables (that are in the dataset) might be included? Why did you exclude those? Next examine the regression coefficients. Which ones are significant? Do the signs match what would be predicted by theory? Are the magnitudes reasonable? (Note your answer should be a well-written few paragraphs, not just terse answers to the above questions. No SPSS output dumps either!)
The questions are worth 120 points. You have 120 minutes to do the exam, one point per minute.
All answers should submitted electronically. Please submit all relevant computer files. Please no “pages” files, save as pdf or rtf. I prefer .Rmd files. No need to put your name, just last digits of ID to identify yourself, so grading is blind. You may refer to your books, notes, calculator, computer, or astrology table. The exam is “open book.” However, you must not refer to anyone else, either in person or electronically, during the exam time!
You must do all work on your own. Cheating is harshly penalized. Please silence all electronic noisemakers such as mobile phones. Good luck. Stay cool.
I created a dataset using the American Community Survey that includes info on college major (for those who completed college) and look at the different wages of those who majored in Business (largest single group) and those in Psychology (6th largest).