Revised paper on inference under CAR

We have a new version of the paper Inference under Covariate-Adaptive Randomization,
joint with Federico Bugni and Azeem Shaikh. This new version requires fewer assumptions for our results to hold and simplifies several of our arguments in the proofs. The abstract of the paper now reads as follows:
This paper studies inference for the average treatment effect in randomized controlled trialswith covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. Such schemes include, for example, Efron’s biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a pre-specified value in such settings, we first show that the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. In a simulation study, we find that the rejection probability may in fact be dramatically less than the nominal level. We show further that these same conclusions remain true for a na ̈ıve permutation test, but that a modified version of the permutation test yields a test that is non-conservative in the sense that its limiting rejection probability under the null hypothesis equals the nominal level for a wide variety of randomization schemes. The modified version of the permutation test has the additional advantage that it has rejection probability exactly equal to the nominal level for some distributions satisfying the null hypothesis and some randomization schemes. Finally, we show that the usual t-test (on the coefficient on treatment assignment) in a linear regression of outcomes on treatment assignment and indicators for each of the strata yields a non-conservative test as well under even weaker assumptions on the randomization scheme. In a simulation study, we find that the non-conservative tests have substantially greater power than the usual two-sample t-test.