Heteroskedasticity-consistent errors in SPSS Kevin
R Foster CCNY Fall
2011 |
|
The Stock and
Watson textbook uses heteroskedasticity-consistent
errors (sometimes called Eicker-Huber-White errors,
after the authors who figured out how to calculate them). However SPSS does not have an internal option
on a drop-down list to compute heteroskedasticity-consistent
standard errors. However with just a bit
more work we can still produce the desired output.
How can we get
heteroskedasticity consistent standard errors? Google (our goddess). I found an SPSS macro, written by Andrew F.
Hayes at
The macro does
not add extra options to the menus, however.
To use the new functionality we need to write a bit of SPSS syntax
ourselves. For example, suppose we are
using the PUMS dataset and want to regress commute time (JWMNP) on other
important variables, such as Age, gender, race/ethnicity, education, and
borough.
We will have
to use the "Name" of the
variable rather than the label. This is
inconvenient but not a terrible challenge.
Age conveniently has name "Age" but the gender dummy has name "female"; the race/ethnicity variables
are "africanamerican" "nativeamerican" "asianamerican" "raceother" and "Hispanic"; education is "educ_hs"
"educ_somecoll" "educ_collassoc" "educ_coll" and "educ_adv";
boroughs are "boro_bx" "boro_si" "boro_bk" and "boro_qns".
(Note that we leave one out for education and borough.)
Go back to the
SPSS Syntax Editor: from the Data View choose "File" "New" "Syntax".
This will re-open the editor on a blank page. Type:
HCREG dv = JWMNP/iv
= Age female africanamerican nativeamerican
asianamerican raceother
Hispanic educ_hs educ_somecoll
educ_collassoc educ_coll educ_adv boro_bx boro_si boro_bk boro_qns.
Then go to
"Run" on the top menu and choose "All" and watch it spit out the
output.
Your output
should look like this,
Run MATRIX
procedure:
HC Method
3
Criterion
Variable
JWMNP
Model Fit:
R-sq F df1 df2 p
.0475
491.2978 16.0000
132326.000 .0000
Heteroscedasticity-Consistent Regression Results
Coeff SE(HC) t
P>|t|
Constant 26.7397
.3700 72.2637 .0000
Age .0450 .0054
8.3550 .0000
female -.2820 .1404
-2.0085 .0446
africana
7.9424 .1999 39.7312
.0000
nativeam 4.2621
1.3060 3.2635 .0011
asianame
5.2494 .2270 23.1237
.0000
raceothe
3.5011 .2720 12.8696
.0000
Hispanic 1.9585
.2269 8.6317 .0000
educ_hs
-1.1125 .2701 -4.1192
.0000
educ_som
-.7601 .2856 -2.6611
.0078
educ_col
.2148 .3495 .6145
.5389
educ_c_1 1.1293
.2720 4.1517 .0000
educ_adv
-1.3747 .2847 -4.8281
.0000
boro_bx
8.3718 .2564 32.6485
.0000
boro_si
12.7391 .3643 34.9712
.0000
boro_bk
9.6316 .1882 51.1675
.0000
boro_qns
10.2350 .1932 52.9754
.0000
------ END
MATRIX -----
Did that seem
like a pain? OK, here's an easier way
that also adds some more error-checking so is more robust.
First do a
regular OLS regression with drop-down menus in SPSS. Do the same regression as above, with travel
time as dependent and the other variables as independent, and note that just
before the output you'll see something like this,
REGRESSION
/MISSING
LISTWISE
/STATISTICS
COEFF OUTS R ANOVA
/CRITERIA=PIN(.05)
POUT(.10)
/NOORIGIN
/DEPENDENT
JWMNP
/METHOD=ENTER
Age female africanamerican nativeamerican
asianamerican raceother
Hispanic educ_hs educ_somecoll
educ_collassoc educ_coll educ_adv boro_bx boro_si boro_bk boro_qns.
This is the
SPSS code that your drop-down menus created.
You can ignore most of it but realize that it gives a list of all of the
variable names (after "/METHOD=ENTER ") so you can do this regression and just copy-and-paste that
generated list into the hcreg syntax.
The other
advantage of doing it this way first is that this will point out any errors you
make. If you put in too many dummy
variables then SPSS will take one out (and note that in "Variables
Removed" at the
beginning of the output). If that
happens then take that out of the list from hcreg or else that will cause errors. If the SPSS regression finds other errors
then those must be fixed first before using the hcreg syntax.
The general
template for this command is "HCREG", the name of the macro, then "DV = " with the name of the Dependent Variable, "IV = " with the
names of the Independent Variables, and then a period to mark
the end of a command line.
The macro
actually allows some more fanciness. It
contains 4 different methods of computing the heteroskedasticity-consistent
errors. If you follow the "IV = " list with "/method = " and a number from 1 to 5 then
you will get slightly different errors.
The default is method 3. If you
type "/method = 5"
then it will give the homoskedastic errors (the same
results as if you did the ordinary regression with the SPSS menus).
The macro
additionally allows you to set the constant term equal to zero by adding "/constant = 0"; "/covmat = 1"
to print the entire covariance matrix; or "/test = q" to test if the last q variables
all have coefficients equal to zero.
Prof. Hayes did a very nice job, didn't he? Go to his web page for complete
documentation.
The Syntax
Editor can be useful for particular tasks, especially those that are
repetitive. Many of the drop-down
commands offer a choice of "Paste Syntax" which will show the syntax
for the command that you just implicitly created with the menus, which allows
you to begin to learn some of the commands.
The Syntax Editor also allows you to save the list of commands if you're
doing them repeatedly.
This syntax,
to perform the regressions, is
HCREG dv = JWMNP/iv
= Age female africanamerican nativeamerican
asianamerican raceother
Hispanic educ_hs educ_somecoll
educ_collassoc educ_coll educ_adv boro_bx boro_si boro_bk boro_qns.
HCREG dv = JWMNP/iv
= Age female africanamerican nativeamerican
asianamerican raceother
Hispanic educ_hs educ_somecoll
educ_collassoc educ_coll educ_adv boro_bx boro_si boro_bk boro_qns
/method = 5 .
Do those in
SPSS and with the regression with the drop menus for comparison. You will see that the results, between the homoskedastic method=5 and the choosen-from-drop-lists,
are identical. More precisely, all of
the coefficient estimates are the same in every version but the standard errors
(and therefore t statistics and thus p-values or Sig) are different between the
two hcreg versions (but hcreg
method 5 delivers the same results as SPSS's drop down menus).