2022 Zellner Award

My co-author Federico Bugni and I have been awarded the 2022 Arnold Zellner Award for the paper “Testing Continuity of a Density via g-order statistics in the Regression Discontinuity Design”, published in the Journal of Econometrics in 2021. The Zellner Award recognizes the best theoretical paper published by the Journal of Econometrics in a given year, following a selection process where associate editors and co-editors submit a list of nominations, and then an award committee comprised of fellows of the Journal of Econometrics selects the winner. The award was announced at the 50th Anniversary reception of the journal at the 2023 ASSA Annual Meeting in New Orleans, Louisiana.

The winning paper proposes a novel test for the continuity of a density at a point based on the so-called g-order statistics. This testing problem is particularly relevant in the context of the regression discontinuity design (RDD), where it is a common practice to assess the credibility of the design by testing the continuity of the density of the so-called running variable at the cut-off point. The paper proposes a novel approximate sign test for this purpose, based on the simple intuition that, when the density of the running variable is continuous at the cut-off, the fraction of units under treatment and control local to the cut-off should be roughly the same. This means that the number of treated units out of the “q” observations closest to the cut-off, is approximately distributed as a binomial random variable with sample size “q” and probability 0.5. The approximate sign test has a number of distinctive attractive properties relative to existing methods used to test our null hypothesis of interest. First, the test does not require consistent non-parametric estimators of densities. Second, the test controls the limiting null rejection probability under fairly mild conditions that, in particular, do not require existence of derivatives of the density of the running variable. Third, the test is asymptotic validity under two alternative asymptotic frameworks that capture the fact that the fraction of useful observations close to the cut-off is quite small relative to the total sample size. In fact, the new test exhibits finite sample validity under stronger conditions than those needed for its asymptotic validity. Fourth, the test is simple to implement as it only involves computing order statistics, a constant critical value, and a single tuning parameter. This contrasts with existing alternatives that require kernel smoothing, local polynomials, bias correction, and under-smoothed bandwidth choices.

 

More on Outcome Tests

Today we have updated the paper On the Use of Outcome Tests for Detecting Bias in Decision Making, joint with Magne Mogstad and Jack Mountjoy. Relative to our first version, the paper now has a new framing, a much broader scope, more extensive connections to multiple strands of the literature on discrimination and outcome tests, and more constructive guidance for researchers interested in deriving and conducting outcome tests across a range of institutional settings and data environments. Our results call into question recent conclusions about racial bias among bail judges, and, more broadly, yield four lessons for researchers considering the use of outcome tests of bias. First, the so-called generalized Roy model, which is a workhorse of applied economics, does not deliver a logically valid outcome test without further restrictions, since it does not require an unbiased decision maker to equalize marginal outcomes across groups. Second, the more restrictive “extended” Roy model, which isolates potential outcomes as the sole admissible source of analyst-unobserved variation driving decisions, delivers both a logically valid and econometrically viable outcome test. Third, this extended Roy model places strong restrictions on behavior and the data generating process, so detailed institutional knowledge is essential for justifying such restrictions. Finally, because the extended Roy model imposes restrictions beyond those required to identify marginal outcomes across groups, it has testable implications that may help assess its suitability across empirical settings.

A Guide to ARTs

In Canay, Romano, and Shaikh (2017) we extended the scope of applicability of randomization tests to cases where such tests could not be justified in finite samples but where it was possible to argue their asymptotic validity under fairly mild conditions. This led to what we call Approximate Randomization Tests (ARTs). An important setting where such tests proved to be particularly useful is the one where the data can be grouped into a small number of clusters.

While ARTs are not particularly difficult to implement from a computational stand point, the principal goal of Canay, Romano, and Shaikh (2017) was to develop the general theory of ARTs and did not focus on the details behind its implementation. In Cai, Canay, Kim, and Shaikh (2021) we now provide a user’s guide to the general theory of ARTs when specialized to linear regressions with clustered data. Such regressions include settings in which the data is naturally grouped into clusters, such as villages or repeated observations over time on individual units, as well as settings with weak temporal dependence, in which pseudo-clusters may be formed using blocks of consecutive observations. An important feature of the methodology is that it applies to settings in which the number of clusters is small – even as small as five.

Cai, Canay, Kim, and Shaikh (2021) provides a step-by-step algorithmic description of how to implement ARTs and construct confidence intervals for the parameters of interest. We additionally articulate the main requirements underlying the test, emphasizing in particular common pitfalls that researchers may encounter. Finally, we illustrate the use of the methodology with two applications that further elucidate these points: one to a linear regression with clustered data based on Meng et al. (2015) and a second to a linear regression with temporally dependent data based on Munyo and Rossi (2015).

In order to facility adoption of ARTs, we have developed a companion Stata package (see the ARTs Bitbucket Repository or visit the software page) and also provided R and Stata files to replicate the two empirical exercises in the paper (replication files).

 

Revised paper on testing continuity in RDD

We have revised a paper, joint with Federico Bugni, on the testing continuity of the density of the running variable in the RDD (see paper here).

In the regression discontinuity design (RDD), it is common practice to assess the credibility of the design by testing the continuity of the density of the running variable at the cut-off, e.g., McCrary (2008). In this paper we propose an approximate sign test for continuity of a density at a point based on the so-called g-order statistics, and study its properties under two complementary asymptotic frameworks. In the first asymptotic framework, the number q of observations local to the cut-off is fixed as the sample size n diverges to infinity, while in the second framework q diverges to infinity slowly as n diverges to infinity. Under both of these frameworks, we show that the test we propose is asymptotically valid in the sense that it has limiting rejection probability under the null hypothesis not exceeding the nominal level. More importantly, the test is easy to implement, asymptotically valid under weaker conditions than those used by competing methods, and exhibits finite sample validity under stronger conditions than those needed for its asymptotic validity. In a simulation study, we find that the approximate sign test provides good control of the rejection probability under the null hypothesis while remaining competitive under the alternative hypothesis. We finally apply our test to the design in Lee (2008), a well-known application of the RDD to study incumbency advantage.

We have also updated the Stata package that implements the new test we propose. You can download the package from the Bitbucket repository (Rdcont), which includes the ado file with an example of how to use it. Visit the software page here for additional Stata and R packages.

Revised Paper on the Wild Bootstrap with Few Clusters

We have revised a paper, joint with Azeem Shaikh and Andres Santos, on the formal properties of the Wild Cluster Bootstrap when the data contains few, but large, clusters [See paper here].

Cameron et al. (2008) provide simulations that suggest the wild bootstrap test works well even in settings with as few as five clusters, but existing theoretical analyses of its properties all rely on an asymptotic framework in which the number of clusters is “large.”

In contrast to these analyses, we employ an asymptotic framework in which the number of clusters is “small,” but the number of observations per cluster is “large.” In this framework, we provide conditions under which the limiting rejection probability of an un-Studentized version of the test does not exceed the nominal level. Importantly, these conditions require, among other things, certain homogeneity restrictions on the distribution of covariates. We also establish that a studentized version of the test may only over-reject the null hypothesis by a “small” amount in the sense that it has limiting rejection probability under the null hypothesis that does not exceed the nominal level by more than an amount that decreases exponentially with the number of clusters. We obtain results qualitatively similar to those for the studentized version of the test for closely related “score” bootstrap-based tests, which permit testing hypotheses about parameters in nonlinear models. We illustrate the relevance of our theoretical results for applied work via a simulation study and empirical application. An important lesson our results is that when these “homogeneity” conditions are implausible and there are few clusters, researchers may wish to consider methods that do not impose such conditions, such as Ibragimov and Muller (2010) and Canay, Romano, and Shaikh (2017).

Revised RDD paper

We have a new version of the paper using approximate permutation tests in the regression discontinuity design, which is joint work with Vishal Kamat. In the regression discontinuity design (RDD), it is common practice to asses the credibility of the design by testing whether the means of baseline covariates do not change at the cutoff (or threshold) of the running variable. This practice is partly motivated by the stronger im- plication derived by Lee (2008), who showed that under certain conditions the distribution of baseline covariates in the RDD must be continuous at the cutoff. We propose a permutation test based on the so-called induced ordered statistics for the null hypothesis of continuity of the distribution of baseline covariates at the cutoff; and introduce a novel asymptotic framework to analyze its properties. The asymptotic framework is intended to approximate a small sample phenomenon: even though the total number n of observations may be large, the number of effective observations local to the cutoff is often small. Thus, while traditional asymptotics in RDD require a growing number of observations local to the cutoff as n → ∞, our framework keeps the number q of observations local to the cutoff fixed as n → ∞. The new test is easy to implement, asymptotically valid under weak conditions, exhibits finite sample validity un- der stronger conditions than those needed for its asymptotic validity, and has favorable power properties relative to tests based on means. In a simulation study, we find that the new test controls size remarkably well across designs. We then use our test to evaluate the plausibility of the design in Lee (2008), a well-known application of the RDD to study incumbency advantage.

Revised Inference under CAR

We have a new version of the paper “Inference under covariate adaptive randomization“, joint work with Federico Bugni and Azeem Shaikh. This paper studies inference for the average treatment effect in randomized controlled trials with covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. Our main requirement is that the randomization scheme assigns treatment status within each stratum so that the fraction of units being assigned to treatment within each stratum has a well behaved distribution centered around a proportion π as the sample size tends to infinity. Such schemes include, for example, Efron’s biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a pre-specified value in such settings, we first show the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the usual standard error of the two-sample t-test leads to a test that is exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. Next, we consider the usual t-test (on the coefficient on treatment assignment) in a linear regression of outcomes on treatment assignment and indicators for each of the strata. We show that this test is exact for the important special case of randomization schemes with π = 1/2 , but is otherwise conservative. We again provide a simple adjustment to the standard errors that yields an exact test more generally. Finally, we study the behavior of a modified version of a permutation test, which we refer to as the covariate-adaptive permutation test, that only permutes treatment status for units within the same stratum. When applied to the usual two-sample t-statistic, we show that this test is exact for randomization schemes with π = 1/2 and that additionally achieve what we refer to as “strong balance.” For randomization schemes with π!= 1/2 , this test may have limiting rejection probability under the null hypothesis strictly greater than the nominal level. When applied to a suitably adjusted version of the two-sample t-statistic, however, we show that this test is exact for all randomization schemes that achieve “strong balance,” including those with π!= 1/2 . A simulation study confirms the practical relevance of our theoretical results. We conclude with recommendations for empirical practice and an empirical illustration.

Revised paper on inference under CAR

Screen Shot 2015-08-06 at 10.31.49 AMWe have a new version of the paper Inference under Covariate-Adaptive Randomization,
joint with Federico Bugni and Azeem Shaikh. This new version requires fewer assumptions for our results to hold and simplifies several of our arguments in the proofs. The abstract of the paper now reads as follows:
This paper studies inference for the average treatment effect in randomized controlled trialswith covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. Such schemes include, for example, Efron’s biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a pre-specified value in such settings, we first show that the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. In a simulation study, we find that the rejection probability may in fact be dramatically less than the nominal level. We show further that these same conclusions remain true for a na ̈ıve permutation test, but that a modified version of the permutation test yields a test that is non-conservative in the sense that its limiting rejection probability under the null hypothesis equals the nominal level for a wide variety of randomization schemes. The modified version of the permutation test has the additional advantage that it has rejection probability exactly equal to the nominal level for some distributions satisfying the null hypothesis and some randomization schemes. Finally, we show that the usual t-test (on the coefficient on treatment assignment) in a linear regression of outcomes on treatment assignment and indicators for each of the strata yields a non-conservative test as well under even weaker assumptions on the randomization scheme. In a simulation study, we find that the non-conservative tests have substantially greater power than the usual two-sample t-test.

Revised Paper on Inference for Subvertors

We have revised the paper Inference for Functions of Partially Identified Parameters Defined by Moment Inequalities. The paper has been significantly edited and the new description is friendlier to practitioners. Section 2 includes a step by step guide to implement our test, without getting into technical details. Section 3 includes a new and simple example that illustrates why a naive GMS approach does not deliver a valid test and why the new test we propose does not suffer from similar problems. The technical aspects, including the formal results, are in Section 4 now.