Jordan Adamson found an error in the code that Solomon Hsiang developed to compute Conley standard errors in Stata. Unfortunately, we transcribed this error when we implemented Hsiang’s code in C++ and R. These errors happen, and Hsiang clearly warns users at the top of his code.

The problem is a single misplaced parathesis in the line calcluating the weight for the Bartlett kernel when correcting for temporal auto-correlation: weight = (1:-abs(time1[t,1] :- time1))/(lag_cutoff+1) (line 430 in the original ado file, version dated 4/29/2013).

Per Newey and West (1987), the Bartlett kernel is

However, the line above instead computes:

The fix is simple: the third parenthesis needs to be moved to the end of the line. Unfortunately, the fix is also consequential, as the uncorrected code can deliver negative weights and lead to standard errors that are too small when there is positive temporal auto-correlation.

Our old and new code is now posted in a public GitHub repo.

Original Code

Here’s the original Stata implementation.

This code delivers the following standard errors:

And our original C++/R implementation:

This matches the standard errors from the Stata output.

Corrected Code

Jordan caught the transcribed error on line 183 of our C++ code. Per Newey and West (1987), we correct (1 - t_diff[j]) / (cutoff + 1) to (1 - t_diff[j] / (cutoff + 1)) and recompute the standard errors.

As is apparent from the final column, correcting the error meaningfully changes the standard errors in the last column. Thiemo’s data is a bit unusual; in other applications with positive temporal auto-correlation, we find that the standard errors tend to increase with the corrected code.

tl;dr: Fast computation of standard errors that allows for serial and spatial auto-correlation.

Economists and political scientists often employ panel data that track units (e.g., firms or villages) over time. When estimating regression models using such data, we often need to be concerned about two forms of auto-correlation: serial (within units over time) and spatial (across nearby units). As Cameron and Miller (2013) note in their excellent guide to cluster-robust inference, failure to account for such dependence can lead to incorrect conclusions: “[f]ailure to control for within-cluster error correlation can lead to very misleadingly small standard errors…” (p. 4).

Conley (1999, 2008) develops one commonly employed solution. His approach allows for serial correlation over all (or a specified number of) time periods, as well as spatial correlation among units that fall within a certain distance of each other. For example, we can account for correlated disturbances within a particular village over time, as well as between that village and every other village within one hundred kilometers.

We provide a new function that allows R users to more easily estimate these corrected standard errors. (Solomon Hsiang (2010) provides code for STATA, which we used to test our estimates and benchmark speed.) Moreover using the excellent lfe, Rcpp, and RcppArmadillo packages (and Tony Fischetti’s Haversine distance function), our function is roughly 20 times faster than the STATA equivalent and can scale to handle panels with more units. (We have used it on panel data with over 100,000 units observed over 6 years.)

This demonstration employs data from Fetzer (2014), who uses a panel of U.S. counties from 1999-2012. The data and code can be downloaded here.

STATA Code:

We first use Hsiang’s STATA code to compute the corrected standard errors (spatHAC in the output below).

Using the same data and options as the STATA code, we then estimate the adjusted standard errors using our new R function. This requires us to first estimate our regression model using the felm function from the lfe package.

We use the felm() from the lfe package to estimate model with year and county fixed effects.

Two important points:

We specify our latitude and longitude coordinates as the cluster variables, so that they are included in the output (m).

We specify keepCx = TRUE, so that the centered data is included in the output (m).

We then feed this model to our function, as well as the cross-sectional unit (county FIPS codes), time unit (year), geo-coordinates (lat and lon), the cutoff for serial correlation (5 years), the cutoff for spatial correlation (500 km), and the number of cores to use.

Estimating the model and computing the standard errors requires under two seconds, making it many times faster than the comparable STATA routine.

R Using Multiple Cores:

Even with a single core, we realize significant speed improvements. However, the gains are even more dramatic when we employ multiple cores. Using 4 cores, we can cut the estimation of the standard errors down to around 0.4 seconds. (These replications employ the Haversine distance formula, which is more time-consuming to compute.)

Given the prevalence of panel data that exhibits both serial and spatial dependence, we hope this function will be a useful tool for applied econometricians working in R.

Feedback Appreciated: Memory vs. Speed Tradeoff

This was Darin’s first foray into C++, so we welcome feedback on how to improve the code. In particular, we would appreciate thoughts on how to overcome a memory vs. speed tradeoff we encountered. (You can email Darin at darinc[at]luskin.ucla.edu)

The most computationally intensive chunk of our code computes the distance from each unit to every other unit. To cut down on the number of distance calculations, we can fill the upper triangle of the distance matrix and then copy it to the lower triangle. With $N$ units, this requires only $N (N-1) /2$ distance calculations.

However, as the number of units grows, this distance matrix becomes too large to store in memory, especially when executing the code in parallel. (We tried to use a sparse matrix, but this was extremely slow to fill.) To overcome this memory issue, we can avoid constructing a distance matrix altogether. Instead, for each unit, we compute the vector of distances from that unit to every other unit. We then only need to store that vector in memory. While that cuts down on memory use, it requires us to make twice as many ($N(N-1)$) distance calculations.

As the number of units grows, we are forced to perform more duplicate distance calculations to avoid memory constraints – an unfortunate tradeoff. (See the functions XeeXhC and XeeXhC_Lg in ConleySE.cpp.)