Heteroskedasticity-consistent errors in SPSS

Kevin R Foster CCNY

Fall 2011

The Stock and Watson textbook uses heteroskedasticity-consistent errors (sometimes called Eicker-Huber-White errors, after the authors who figured out how to calculate them). However SPSS does not have an internal option on a drop-down list to compute heteroskedasticity-consistent standard errors. However with just a bit more work we can still produce the desired output.

How can we get heteroskedasticity consistent standard errors? Google (our goddess). I found an SPSS macro, written by Andrew F. Hayes at Ohio State University, who wrote the code and provided documentation. Download the macro, hcreg.sps, (from InYourClass, in the "Kevin Foster SPSS" Group) and start up SPSS. Before you do the regressions, click "File" then "Open" then "Syntax…". Find the file that you downloaded (hcreg.sps) and open it. This will open the SPSS Syntax Editor. All you need to do is choose "Run" from the top menu then "All". There should not be any errors. You need to run this macro each time you start up SPSS but it will stay in memory for the entire session until you close SPSS.

The macro does not add extra options to the menus, however. To use the new functionality we need to write a bit of SPSS syntax ourselves. For example, suppose we are using the PUMS dataset and want to regress commute time (JWMNP) on other important variables, such as Age, gender, race/ethnicity, education, and borough.

We will have to use the "Name" of the variable rather than the label. This is inconvenient but not a terrible challenge. Age conveniently has name "Age" but the gender dummy has name "female"; the race/ethnicity variables are "africanamerican" "nativeamerican" "asianamerican" "raceother" and "Hispanic"; education is "educ_hs" "educ_somecoll" "educ_collassoc" "educ_coll" and "educ_adv"; boroughs are "boro_bx" "boro_si" "boro_bk" and "boro_qns". (Note that we leave one out for education and borough.)

Go back to the SPSS Syntax Editor: from the Data View choose "File" "New" "Syntax". This will re-open the editor on a blank page. Type:

HCREG dv = JWMNP/iv = Age female africanamerican nativeamerican asianamerican raceother Hispanic educ_hs educ_somecoll educ_collassoc educ_coll educ_adv boro_bx boro_si boro_bk boro_qns.

Then go to "Run" on the top menu and choose "All" and watch it spit out the output.

Your output should look like this,

Run MATRIX procedure:

HC Method

Criterion Variable

JWMNP

Model Fit:

R-sq F df1 df2 p

.0475 491.2978 16.0000 132326.000 .0000

Heteroscedasticity-Consistent Regression Results

Coeff SE(HC) t P>|t|

Constant 26.7397 .3700 72.2637 .0000

Age .0450 .0054 8.3550 .0000

female -.2820 .1404 -2.0085 .0446

africana 7.9424 .1999 39.7312 .0000

nativeam 4.2621 1.3060 3.2635 .0011

asianame 5.2494 .2270 23.1237 .0000

raceothe 3.5011 .2720 12.8696 .0000

Hispanic 1.9585 .2269 8.6317 .0000

educ_hs -1.1125 .2701 -4.1192 .0000

educ_som -.7601 .2856 -2.6611 .0078

educ_col .2148 .3495 .6145 .5389

educ_c_1 1.1293 .2720 4.1517 .0000

educ_adv -1.3747 .2847 -4.8281 .0000

boro_bx 8.3718 .2564 32.6485 .0000

boro_si 12.7391 .3643 34.9712 .0000

boro_bk 9.6316 .1882 51.1675 .0000

boro_qns 10.2350 .1932 52.9754 .0000

------ END MATRIX -----

Did that seem like a pain? OK, here's an easier way that also adds some more error-checking so is more robust.

First do a regular OLS regression with drop-down menus in SPSS. Do the same regression as above, with travel time as dependent and the other variables as independent, and note that just before the output you'll see something like this,

REGRESSION

/MISSING LISTWISE

/STATISTICS COEFF OUTS R ANOVA

/CRITERIA=PIN(.05) POUT(.10)

/NOORIGIN

/DEPENDENT JWMNP

/METHOD=ENTER Age female africanamerican nativeamerican asianamerican raceother Hispanic educ_hs educ_somecoll educ_collassoc educ_coll educ_adv boro_bx boro_si boro_bk boro_qns.

This is the SPSS code that your drop-down menus created. You can ignore most of it but realize that it gives a list of all of the variable names (after "/METHOD=ENTER ") so you can do this regression and just copy-and-paste that generated list into the hcreg syntax.

The other advantage of doing it this way first is that this will point out any errors you make. If you put in too many dummy variables then SPSS will take one out (and note that in "Variables Removed" at the beginning of the output). If that happens then take that out of the list from hcreg or else that will cause errors. If the SPSS regression finds other errors then those must be fixed first before using the hcreg syntax.

The general template for this command is "HCREG", the name of the macro, then "DV = " with the name of the Dependent Variable, "IV = " with the names of the Independent Variables, and then a period to mark the end of a command line.

The macro actually allows some more fanciness. It contains 4 different methods of computing the heteroskedasticity-consistent errors. If you follow the "IV = " list with "/method = " and a number from 1 to 5 then you will get slightly different errors. The default is method 3. If you type "/method = 5" then it will give the homoskedastic errors (the same results as if you did the ordinary regression with the SPSS menus).

The macro additionally allows you to set the constant term equal to zero by adding "/constant = 0"; "/covmat = 1" to print the entire covariance matrix; or "/test = q" to test if the last q variables all have coefficients equal to zero. Prof. Hayes did a very nice job, didn't he? Go to his web page for complete documentation.

The Syntax Editor can be useful for particular tasks, especially those that are repetitive. Many of the drop-down commands offer a choice of "Paste Syntax" which will show the syntax for the command that you just implicitly created with the menus, which allows you to begin to learn some of the commands. The Syntax Editor also allows you to save the list of commands if you're doing them repeatedly.

This syntax, to perform the regressions, is

HCREG dv = JWMNP/iv = Age female africanamerican nativeamerican asianamerican raceother Hispanic educ_hs educ_somecoll educ_collassoc educ_coll educ_adv boro_bx boro_si boro_bk boro_qns.

HCREG dv = JWMNP/iv = Age female africanamerican nativeamerican asianamerican raceother Hispanic educ_hs educ_somecoll educ_collassoc educ_coll educ_adv boro_bx boro_si boro_bk boro_qns

/method = 5 .

Do those in SPSS and with the regression with the drop menus for comparison. You will see that the results, between the homoskedastic method=5 and the choosen-from-drop-lists, are identical. More precisely, all of the coefficient estimates are the same in every version but the standard errors (and therefore t statistics and thus p-values or Sig) are different between the two hcreg versions (but hcreg method 5 delivers the same results as SPSS's drop down menus).