In order to pass the time this summer until NFL rolls around, thought it would be fun to take a swing at building a simple model for MLB that could be used as a public example to talk through some of the tricks and strategies behind developing a model to assist in sports handicapping. Part 1 will touch on the basics and provide general tips, Part 2 we get into some of the fundamental theory and functions that we are using.
Note, this is coming from the perspective of a recreational bettor and reflects on lessons learned over the years in carrying out a modeling-based approach to handicapping. Chances are good if you are currently using some form of a model on a day-to-day basis you have already learned this stuff the hard way.
Sorry, this was the only picture I liked on the Google Image search for “Baseball Model”…
How complex should we make this thing? Based on past experience, the best approach has been to start simple, get some results and then add levels of complexity that incorporate angles and situations that you otherwise use to handicap the given sport. Couple key notes on models and modeling:
- A model (or power ratings for that matter) is best used as a starting point to figure out how the game/match would be expected to play out in a neutral situation, from there it is up to the handicapper to use information and know-how to adjust the neutral result to develop a situation-specific projection and then compare that expectation to the market to identify an edge on a side or total.
- The goal of a model should be to project a win %tage or a score in a way that effectively challenges or reinforces the “gut feelings” of a handicapper (you are essentially flying blind if you are wagering without some tool in hand to support your side).
- The uncertainty and randomness in sports is high from game-to-game and there is no magic formula/algorithm to predict scores exactly right, a correctly projected score is a reflection of a well centered model and the appropriately captured deviation from the average performance.
- From the start, make it easy on yourself and set up the model in a way that it doesn’t require lots of manual data entry on a day-to-day basis; a labor intensive model is more likely to be abandoned and is taking your time away from evaluating other aspects of the games you are trying to handicap.
- While it is popular to extol the virtues of the Kelly Criterion and clearly it is incredibly important to understand the relationship between price and edge, for most people (including myself) a flat betting system that is informed by (but not dependent on) a model will work the most effectively in the long term; blind betting a model with variable units tied to perceived edge is a great way to get into bankroll trouble extremely quickly.
- Find free, reliable input data that captures the overall aspects of the sport you are trying to model, and update the database regularly in a systematic way that is easy for you to repeat over the course of the season; the best data is adjusted for the strength of the opponent in some way and/or normalized to the league average such that the 50th percentile represents an average team on a neutral field in a normal situation.
So as noted above, this post and the following posts are diving into the world of Major League Baseball. This is not a sport I have reliably handicapped in the past and I have relatively low expectations that we will find quick success with this. The Hardball Model experiment is mostly for entertainment and instructional purposes but that said I’m taking this seriously.
Step 1: choose a platform to build your model. For this example I’m using Google Sheets but Excel is an equally good choice. Google sheets is easier to share publicly so it would be a preferred way to do things if you are collaborating which is strongly encouraged. There is really no wrong way to do this so choose whatever you are comfortable with.
Step 2: put together your schedule databases. The first database I always work on is the schedule. I prefer to have every game for the whole season in a single place and call up a given day’s games automatically to save the time of entering each team. For the MLB model I’ve imported the baseball-reference.com/ schedule to my google sheet. ESPN, MLB.com or other shops are fine too but I strongly recommend you get your schedule from the same place you will get you input data (batting/pitching stats) from, that way there is consistency on how the team names are spelled and abbreviated (this will save you major headaches later on). A totally optional step is to organize the games in the same order that Vegas organizes them, I do this by typing out the rotation number for every game as the first step in my daily handicapping routine and then sorting the games by that rotation number; this optional step makes entering odds and results easier if you use a site like sportsinteraction.com or sbrodds.com to get your numbers/scores (a really sharp capper would use some sort of script to pull the odds automatically but that’s over my pay grade).
Turns out there is an extra wrinkle with baseball, the projection is almost entirely dependent on the starting pitcher which means we need an easy, repeatable way to get the probable pitchers into the model. Again for this step I went to baseball-reference.com and copied and pasted the probable pitchers into a blank notepad and then into a google sheet tab where I can automatically look them up and import them to the model.
Do what makes sense to you but for this example, once the text was pasted into “ Today’s Pitchers” tab, they were sorted by column A so that a simple lookup function can be used to pull them into the main model tab later. The key for this example is “ARI” in column A is in the same row as “ Zack Greinke” in column B, we’ll re-visit this in Part 2 when we dive into the usefulness of lookup functions.
Step 3: find your model input data. For this example I decided I wanted three classes of input data, team batting, starting pitching by pitcher and team relief pitching. I’m telegraphing the structure of the algorithm we’ll eventually develop but my basic strategy was “let’s pull in a parameter that generally captures how good a team is at generating runs combine that with an opposing pitching score that is a weighted combination of the strength of the starting pitcher and the team’s bullpen”. Many people elect to handle batting on a player by player basis but that’s too complicated for my taste, I prefer to tweak the batting score up or down subjectively if a key player is out for rest or injury.
Again, for this experiment I’m pulling the data from the baseball-reference.com website and this was an no-brainer because you can automatically export the data into columnated excel format that is easy to copy, paste, manipulate and lookup with the model. At this point in the season, the input data is stable enough that it doesn’t need to be updated daily but rather weekly, so every Monday we’ll download and paste the updated results into the model support tabs “Pitching Stats” and “ Batting Stats”.
The other key input data set that was developed for this experiment was the scoring distribution for a Major League Baseball team. This is key for answering the question “well how many runs are we expecting?” which is fundamental if you are betting MLB totals. People have implemented various approaches to this I’m sure but my favorite approach is to put together a distribution based on past results and then when you figure out the percentile you expect from your given team then you look up how many runs correspond to that percentile. Here is the scoring distribution for every team result in baseball from 2016:
What is shown here is that if a team has an offensive performance at the 50th percentile, then based on last year’s games they would be expected to score 4 runs. Similarly an 84th percentile performance (one standard deviation over the median) would result in 7 runs scored and a 16th percentile performance (one standard deviation below the median) would result in 2 runs scored. In the next step we’ll talk about projecting the offensive performance but this is our backbone for our score projection so we will always return a real result when we predict a score (not like 4.3287 runs or something silly).
Generally stated, an average hitting team against average pitching would get you to the 50th percentile, a good hitting team against sorry pitching would get you to the 84th percentile and a lousy hitting team against great pitching would get you to the 16th percentile. Let’s get more specific next…
Step 4: construct the algorithm. For the hardball model we are going to try to calculate two things, the expected number of runs for Team A and the probability of Team A winning. We’ll start with runs scored.
So Team A needs a batting score and we want this score to represent how much better Team A is than an average team at generating runs. For this experiment we’ll start with an advanced batting stat developed by Bill James called “Runs Created”. Ideally we have some parameter that is normalized by the strength of the opponent and is appropriately centered so that an average team against an average pitcher and pen at an average ballpark is projected to perform at the 50th percentile. To center out team batting score, we’ll convert the RC parameter to number of standard deviations better than the average:
So TBS (Team Batting Score) for the Yankees is among the best in the league at 1.788 standard deviations better than the league average, meanwhile Philly is among the league worst at -1.592 standard deviations below the league average. Next we want a composite score for Team B pitchers, for this score we’ll combine the starting pitcher’s individual score (SPS) with the team’s relief pitching score (BPS) using 2/3 weight on the starter (assuming 6 innings pitched) and 1/3 weight on the pen (assuming 3 innings pitched). Still figuring out what advanced stat makes the most sense for starting pitchers but for now using normalized Game Score, again converting it to number of standard deviations away from the average and finally weighting it by the number of games started such that the greater the body of work from a pitcher the stronger the parameter (i.e. the farther from average, either good or bad). Lastly we use normalized runs conceded by the bullpen to develop our BPS.
So if Team A has a TBS of 0.5 and Team B starting pitcher is -1.0 and their bullpen is -0.5 then the total Team B pitching score is -0.833. We’ll combine these two parameters as A_Batting minus B_Pitching over two so Team A Batting 0.5 standard deviation over average facing a Pitching combo conceding almost 1 standard deviation below average we would expect a good amount of runs. We convert (0.5-(-0.833))/2 = 0.667 standard deviations to the 75th percentile using a normal distribution function with a mean of zero and standard deviation of 1. Then looking up our backbone curve we see a 75th percentile runs score is 6 runs. Pretty simple, repeat for Team B facing Team A and let’s say we get 4 runs, then our un-adjusted projection is Team A wins 6–4.
For the win probability we’ll use the tried and true log5 method developed by Bill James to estimate the probability that Team A will win a game given their performance parameter combined with their opponents performance parameter.
Step 5: make adjustments, as necessary. Now we should note a couple things, when implementing a model this simple, adjustments are everything. Right away we need to account for an obvious one that applies in every sport, home field advantage. There are many sophisticated approaches to deal with this but I’m not an expert in complex strategies for baseball so we’ll handle this by giving the home pitcher and home batters a half standard deviation bump to account for the fact that they are playing at home and get to bat last. Ballpark adjustments are another obvious angle that needs to be addressed so we’ll add a ballpark factor to adjust the expected runs for places that are particularly hitter friendly like COL or pitchers parks like LAD. Finally I’m not interested in projecting tie scores so I need to implement a tie-break. To do this we’ll look at the win probability and see which team is more likely to win based on our log5 calc, then adjust that team up a run, or their opponent down a run and in a way that is sort of cheating, we’ll use the lined total to figure out which way we should go (i.e. if the projected tie score is lower than the lined total we’ll add a run to the team with the higher win probability and visa versa).
Step 6: start tracking results. Now as we start to accumulate results it should become obvious that their are other key angles that need to be implemented. You may say “how can you ignore left-handed pitchers throwing against left-handed hitter heavy lineups?” or “what about if a pitcher is making his first start of the season and we don’t have any data on him?”, those are valid questions that we need to eventually account for. But for now we’ll build in manual adjustment “knobs” where we can apply +/- standard deviations to the average to account for situational angles that are apparent. The more important thing is to explore the projected scores to see if there is a systematic bias in the basic algorithm that is favoring home/away teams, underdogs/favs or over/unders. Once we have some confidence that the fundamental algorithm is providing a firm starting point then we can get fancy.
In part 2 we’ll zero in on some of the specific equations and functions that are key in organizing and automizing the process such that you can focus your efforts on finding information that will sharpen your handicap overall and determine the appropriate game-specific adjustments. Until then you can check out the model and it’s results so far here: https://docs.google.com/spreadsheets/d/1lx5CWhM1k4JirYE3khavGelRqoDlHlrmraTeQqOvSbY/edit?usp=sharing
As far as I’m concerned, some model is better than no model and there is no wrong way to use math and statistics to help become a sharper handicap. Best-of-Luck!