On Outcome Tests for Detecting Bias

The paper On the Use of Outcome Tests for Detecting Bias in Decision Making, joint with Magne Mogstad and Jack Mountjoy is now available. This paper starts with the observation that the decisions of judges, lenders, journal editors, and other gatekeepers often lead to disparities in outcomes across affected groups. An important question is whether, and to what extent, these group-level disparities are driven by relevant differences in underlying individual characteristics, or by biased decision makers. Becker (1957) proposed an outcome test for bias leading to a large body of related empirical work, with recent innovations in settings where decision makers are exogenously assigned to cases and vary progressively in their decision tendencies. We carefully examine what can be learned about bias in decision making in such settings. Our results call into question recent conclusions about racial bias among bail judges, and, more broadly, yield four lessons for researchers considering the use of outcome tests of bias. First, the so-called generalized Roy model, which is a workhorse of applied economics, does not deliver a logically valid outcome test without further restrictions, since it does not require an unbiased decision maker to equalize marginal outcomes across groups. Second, the more restrictive “extended” Roy model, which isolates potential outcomes as the sole admissible source of analyst-unobserved variation driving decisions, delivers both a logically valid and econometrically viable outcome test. Third, this extended Roy model places strong restrictions on behavior and the data generating process, so detailed institutional knowledge is essential for justifying such restrictions. Finally, because the extended Roy model imposes restrictions beyond those required to identify marginal outcomes across groups, it has testable implications that may help assess its suitability across empirical settings.

A few days after our paper became public, the authors of the paper “Racial Bias in Bail Decisions,” The Quarterly Journal of Economics 133.4 (November 2018): 1885-1932, wrote a correction appendix to their paper and a note with comments on our paper. You can find both files on the authors’ websites or appended to the end of the reply we discuss below. We found these comments unclear and so we decided to write a reply in the note linked below to help the interested reader understand both sides of the argument:

Reply to “Comment on Canay, Mogstad, and Mountjoy (2020)” by Arnold, Dobbie, and Yang (ADY).

We divide the arguments into three points. First, we do not mischaracterize the definition of racial bias in the published version of ADY. If the authors wrote the published definition, but actually meant a substantially different definition (such as the one that now appears in the new “Correction Appendix,” also appended to this reply), then that is clearly the relevant mischaracterization. Second, focusing on clear-cut cases of (un)biased behavior is a feature of our argument, not a bug. The point is that even in the starkest, most unambiguous cases of unbiased and biased behavior, the outcome test can deliver the wrong conclusion. This logical invalidity of the outcome test also extends to intermediate cases where judges are biased against some defendants but not others. Third, to restore the logical validity of the outcome test, instead of invoking a decision model that justifies the test, ADY choose to redefine racial bias. Problematically, their substantial post-publication change in the definition of (un)biased judge behavior matters greatly for the interpretation and implications of their findings. The new definition is reverse-engineering, difficult to justify, and at odds not only with the work by Becker that ADY cite frequently, but also with more recent work by a subset of the authors of ADY.