Author: Ben Gutstein (bengustein2024@u.northwestern.edu)
A critical aspect of doing deep dives into sports data is being able to use current data to simulate thousands of games accurately at once. This allows you to change aspects of your gathered statistical analysis and find efficient outcomes to see the most effective ways to change your game for the better. In my introduction to sports data and code, I decided to learn how to develop a game simulator, and in this article, I walk through my development process.
I started off with the most basic simulator that could be developed, using Matlab R2021a, a basic calculating coding software, to create a simulator that will be given certain odds of either hitting a home run or getting an out, and will simulate X amount of games. The code with comments is shown in Figure 1.
Figure 1
The expected outcome of this test is that if the odds of hitting a homerun or getting an out are 50/50, and the team will score 27 runs per game. Figure 2 shows the average runs per game simulated over 10000 games with the odds of hitting a home run as 50/50. It also shows a graph of all the scores of the 10000 games lined up in a bar graph. This 10000 game trial resulted in an average game score of 26.9988.
Figure 2
The second step to developing a realistic and effective game simulator is to add a level of complexity to the function that the system is running. My second concept was to create a simulator where a hitter either gets a single or gets out. The simulator counts how many singles are hit in each inning. If four singles are hit in an inning, then 1 run is added to the total. If a run is scored then the simulator removes one of the baserunners because they already scored, but there are three men still on base. In Figure 3, you can see the code that creates this simulator. Comments are added so anyone can understand what each section of the code means.
Figure 3
I used this second simulator to show what happens if this rudimentary game simulator runs 10000 games at once with batters that hit .300 and .500. The difference in average score is very apparent because it becomes increasingly more difficult to stack together four singles as the batting average drops. In figure 4, you can see the average runs and graph with a .300 batting average. Figure 5 shows the average runs and graph of the games with a .500 batting average. With a .300 batting average, the average runs per game is 1.0303 and the average with a .500 batting average is 8.5191. The difference is very substantial because of the way the simulator works. As the odds of getting a single drops, it is more difficult to get four singles in an inning before three outs.
Figure 4
Figure 5
The second simulator is by definition more complex than the first, as it had to count innings and baserunners. The third simulator that I created is a full game simulator. It walks through an entire game using the same odds of each batting outcome from the 2006 season. This simulator allows you to get a baseline of 10000 games and see how you can increase OBP to impact runs per game or to see the ideal strikeout to home run ratio.
Figure 6
Figure 6 shows the odds of each at bat given by the data in the 2006 season. Each odd adds on to the next because the odd of anything happening during an at bat is 100%. For example, the odds of a single occurring is 15.8%, and the odds of a double are 20.7%-15.8%=4.9%. , as the odds of a single or double occurring is 20.7%. Figure 7 gives an example of the simulator, which runs through every possible situation and outcome if a single occurs.
Figure 7
Figure 8
Figure 8 shows the outcome of a simulation over 10000 games using the 2006 data. The average runs per game produced by the simulator is 4.87, and the true average from that season was 4.86 runs per game. This is well within the 95% confidence interval for a simulator of this kind. This game simulator can be used to answer many questions when it comes to the most efficient ways to play baseball and I am excited to tackle those in the future.
Future articles will use this game simulator to develop thoughts and concepts that can only be developed by simulating thousands of games at once. In an upcoming article, a peer and I dive into the ideal ratio between strike outs, on base percentage and homeruns.
Be the first to comment on "Developing an MLB Game Similulator"