In Canay, Romano, and Shaikh (2017) we extended the scope of applicability of randomization tests to cases where such tests could not be justified in finite samples but where it was possible to argue their asymptotic validity under fairly mild conditions. This led to what we call Approximate Randomization Tests (ARTs). An important setting where such tests proved to be particularly useful is the one where the data can be grouped into a small number of clusters.
While ARTs are not particularly difficult to implement from a computational stand point, the principal goal of Canay, Romano, and Shaikh (2017) was to develop the general theory of ARTs and did not focus on the details behind its implementation. In Cai, Canay, Kim, and Shaikh (2021) we now provide a user’s guide to the general theory of ARTs when specialized to linear regressions with clustered data. Such regressions include settings in which the data is naturally grouped into clusters, such as villages or repeated observations over time on individual units, as well as settings with weak temporal dependence, in which pseudo-clusters may be formed using blocks of consecutive observations. An important feature of the methodology is that it applies to settings in which the number of clusters is small – even as small as five.
Cai, Canay, Kim, and Shaikh (2021) provides a step-by-step algorithmic description of how to implement ARTs and construct confidence intervals for the parameters of interest. We additionally articulate the main requirements underlying the test, emphasizing in particular common pitfalls that researchers may encounter. Finally, we illustrate the use of the methodology with two applications that further elucidate these points: one to a linear regression with clustered data based on Meng et al. (2015) and a second to a linear regression with temporally dependent data based on Munyo and Rossi (2015).
In order to facility adoption of ARTs, we have developed a companion Stata package (see the ARTs Bitbucket Repository or visit the software page) and also provided R and Stata files to replicate the two empirical exercises in the paper (replication files).