Authors: Jake Atlas (McCormick ’19), Walker Reinfeld (Weinberg ’20), Joey Reinsdorf (Weinberg ’19)
In the age of technology, the presence of human umpires in baseball is becoming increasingly disputed. Major League Baseball (MLB) is already using Statcast, which is capable of determining balls and strikes with much greater efficacy than even the most highly-trained human eyes. As a result, we can actually determine the magnitude of the effect umpire mistakes have on MLB games.
While some umpire errors have an obvious effect on games, like umpire Jim Joyce’s incorrect call that spoiled Armando Galarraga’s would-be perfect game in 2010, more subtle effects are unclear. It would be a blow to the World Umpires Association if it could be proven that umpire mistakes have a meaningful impact on the game.
We know that umpires make mistakes on a daily basis, but how biased are these mistakes?
Using pitcher salary as a proxy for the quality of each pitcher, we examined whether there was any potential for a causal relationship between markers measuring types of umpire mistakes occurring for each MLB pitcher in 2017 and those pitchers’ salaries. We’ve defined four markers that were used throughout the study: incorrect_favors_pitcher_standardized (IFPS), incorrect_favors_batter_standardized (IFBS), unfair_strikout_standardized (USS), and unfair_walk_standardized (UWS).
IFPS measures the number of pitches that should have been called balls, but were instead called strikes by the umpire. It’s standardized by the number of pitches thrown by the pitcher during the course of the season. IFBS measures the number of pitches that should have been called strikes, but were instead called balls. It is standardized similarly to IFPS.
For each pitcher, USS measures the percentage of batters faced by the given pitcher who struck out on a called strike that should have been a ball. UWS measures the percentage of batters faced by the given pitcher who walked on a ball that should have been called a strike.
From the graphical representations of these four markers, it’s evident that there is no highly important relationship between any of these factors and salary. However, it can be the case that after controlling for other factors, a relationship may appear, so we delved deeper into the data to probe for the possibility of umpire bias.
On average, we see that umpires make mistakes: on average, pitchers in 2017 had a little over 5% of pitches miscategorized. Of these mistakes, roughly 55% favored the pitcher. This difference is statistically significant at even the most stringent of the standard statistical requirements (p-val = 8.121e-10). Therefore we know that the umpires – at least in 2017 – favored pitchers more than batters, but is there a pattern that suggests better pitchers may be more or less favored?
Initial statistical testing suggests that this should not be the case in most cases, corroborated by the graphs above that show the apparent lack of a relationship between salary and the markers of umpire errors. Without controlling for any other factors regarding pitchers, the likelihood of independence is rather high. The table below shows the p-values associated with hypothesis tests for lack of true correlation between salary and each of the four error markers. These p-value represent probabilities (bounded between 0 and 1) that we would see what we saw (the 2017 pitcher data) if the given marker were in fact completely unrelated to salary and the natural logarithm of salary.
|Error Marker||P-Value: Compared to Salary||P-Value: Compared to ln(Salary)||Direction of the Sample Correlation|
We see that in most cases, standard statistical threshold p-values are not achieved, and applying any reasonable correction for multiple comparisons, all cases are considered statistically insignificant. However, the combination of logic and the directions of the sample correlations still suggest that there is a possibility of umpire bias in favor of pitchers who have higher salaries.
In order to further explore the possibility of a relationship between umpires’ missed calls and pitcher salaries, we leveraged regression analysis tools. In order to control for remaining differences between pitchers that might bias umpires – besides salary – a number of other variables were used in the analysis, including the pitchers’ team, age, method of acquisition by the team, whether the pitcher is a starter, and some pitcher statistics.
In order to ensure the robustness of the chosen models, other plausible regression models were also created. It was found that, in general, models are not universally robust to seemingly reasonable changes. This indicates that none of the models can be used to make conclusive decisions on its own, but these models were useful in pinpointing potential criteria that may sway umpires. Further examination into each of these potential factors, however, showed inconsequential relationships. Even where statistically significant relationships exist, they are practically meaningless.
It appears that umpires – despite their propensity for making errors – are completely unbiased in their error-making. No factors explored had any statistically significant and practically meaningful relationship that suggested umpires are biased towards certain types of pitchers.
While this is not the result that can bring down the World Umpires Association, one thing is clear: umpires are human, and therefore unable to call a game without making mistakes. No particular type of player appears to benefit from this, on average, but there is no denying the impact that umpires have on the game.