Building a Hockey Model

Awesome guest submission from @holmes168 who used the recent Deep Dive Pods as an inspiration to take on Hockey modeling and decided to share in his experience… great tips for those of you just diving into handicapping using analytics!

Hockey_1.jpg

ONE PERSON’S JOURNEY BUILDING A SPORTS BETTING MODEL

What is my definition of a sports betting model?  My characterization of a model is a formal breakdown of statistics resulting in a prediction on the outcome of a sporting event.  I want to clarify my description of modeling because the gambling universe has multiple interpretations of most everything.  Furthermore, I want to ensure the audience understands- I am not a computer programmer, do not utilize R or Python, but can utilize Microsoft Excel and Google.  In other words- I am your average guy with a family and job who takes sports and sports betting seriously.  Recently, I have undergone a months long process building an NHL model that shows promise, but is still in the development phase.  I would like to share with you my journey building a sports betting model in the hopes that it can shed light on the process and you will see that a model (hopefully a winning model) can be built by anyone in the sports betting community willing to work hard.

 My sports wagering started last year after a discussion between Texas and TCU alumni about their upcoming game in the 2018 college football season.  Even though I currently live in Fort Worth- I didn’t care who won- my goal was to add fuel to their argument.  Everyone knows how heated college football debates can get and by the end of the day, I had opened an offshore account and placed a $50 wager on the Horned Frogs at -4.  After all EVERYONE knew TCU was better and Texas was not back.

I watched the line move down from TCU at -4 to -1 and a quick search of game results shows my afternoon did not go well.  My guaranteed $45 payday never came in and I was done sports betting after the Frogs were dominated on the gridiron.  Except, I knew the next day, the Eagles would beat the Colts handily and the Jags would not lose to Tennessee.  The easiest way to make my money back and earn a profit was a two game NFL Money Line parlay.  I sweated out the Eagles 20-16 win, but was shocked when the Titans beat Jacksonville.  Now- I was $100 in the hole after two days.  No worries- I had another easy win coming up, so I tossed all my remaining monthly allowance on the Cowboys/Seahawks Under 40, which thankfully came through.  I immediately realized continuing to put my hard earned cash into the sports wagering world without a plan was foolhardy. 

I started cobbling together basic college football models, which generated a score, leading to a slight edge over betting my gut.  The model was done in excel and averaged pass yards, rush yards, and scoring.  I compared my expected score to the spread and if I felt there was an edge, I’d bet the game. The model did well enough I survived the college football season and did well in Bowl games.  However, I knew there was more data to exploit and a more systematic way to gain an edge.  I know more about hockey than hoops, so I decided to develop a model to quantify my edge in the NHL.

My initial attempt at NHL modeling was figuring out Totals.  I took the goals for/goals against for two teams and came up with a data point to place a wager taken from from the hockey standings at that time. To figure out the score of the Washington at Pittsburgh match, my initial attempt took the ((Capitals GF + Penguins GA)/69)/2 which shows Washington scoring 3.23 goals.  The Penguins score utilized the same formula, just swapping out the team names to get a 3.25 goals for Pittsburgh.A quick check shows the O/U at 6.5, the two scores added together equals 6.48, so throw a unit on the under.

Hockey_3.jpg

Rough Start…

I tested my “model” by placing small wagers, winning those wagers, and knowing I had an edge. I placed multiple bets on totals, with the knowledge I was going to crush the book……and lost every single one of them.

Looking back on my start to hockey betting, it is easy to spot multiple different problems with the system I was using.  There was no analysis of the netminders, home ice figured in, fatigue, injuries, and many other aspects of hockey (or any sport) needed to be considered prior to placing a wager.  I would argue the attempt I made was still better than jumping on the internet and placing a bet, but not by much.  The losing was rough, but in the long run, the failure to accurately predict hockey totals was a turning point in my gambling life.

I learned the hard way that you need more than averaging goals together needed to beat oddsmakers. The team at Pinnacle is using multiple strategies, algorithms and data points to set an opening line.  However, I believe 100% that a person willing to do the hard work to become a sharper gambler can win on a consistent basis.  The biggest problem that many of us face is where to start.  The data needed is posted in multiple spots on the internet, but gathering the information can seem overwhelming.  

I enjoy sports, numbers, spreadsheets and betting.  I am not one to back away from challenges and vowed to redouble my efforts to be successful.  I plugged “sports betting model” into the google machine and after multiple refinements, I found the book Trading Bases by Joe Peta.  I do not believe that anyone serious about building a sports betting model can get started without reading this book.  I will not do a review of the book, but what’s important is Joe was a Wall Street trader who developed a baseball model that did well over a season.  Joe explains the development of three different tools (for the most part), which I adapted for my first hockey model.

I plugged in each teams winning percentage, added 6% to the home team’s chances of winning (hockey home ice teams have roughly a 53-47% winning percentage), expected goals scored for/against, and the goaltenders winning percentage.  A step in the right direction, but after monitoring for a few weeks, there was not a consistent pattern that would emerge and my implied probability had huge swings in variance compared to the line.

Thankfully, I was not betting on games during this time, but continued to research hockey betting models. Moving on from basic statistics, I began looking at more advanced analytics that lead me to Corsi, Fenwick and xGoals.  I went to a website that keeps track of more progressive hockey metrics, www.naturalstatrick.com and educated myself on what each statistic meant to a team winning.  I filtered on Corsi For/Against and saw that teams with a higher percentage in their favor were typically showing a positive winning percentage over the season.  I took the next step in building my model by incorporating the stats I felt were important.This evolution in my process did get me much closer to the line, but there were still too many variances to blindly trust model results.  The below picture is an example of where I was at in my model knowing there was still more to do.

I continued to test my model and began to bet the model, with decent results but knew I was missing more data in building a reliable product.  Averaging Corsi, Home Ice, goalie save percentage was good, but I knew there was more specific data being used by professionals.  A beginning modeler must have the continued desire to research and build your own model.  Over time, the work should be better strategy than blindly tailing someone on twitter (except the Whale and Andy of course).  The craving to be a successful model builder and sports bettor forced me to take one last look at my model; I knew it could be better.

Hockey_6_2.jpg

I knew to build a model that would give me a chance, a deeper dive in more hockey analytics was necessary to calculate edges.  I was able to find more in depth data on goaltenders- GSAA, High Dangers Save Percentages.  Paying attention to how coaches rotated their goalies to start games is important in figuring out your implied probability.  A wrong guess that Andrei Vasilevskiy is taking a day of rest and Louis Domingue is the starting netminder can go a long way in making your night miserable.

Discovering this useful information on goaltenders began my delving into more team specific data than Corsi %’s or PDO.  I continued to search for more data points to add to what was becoming an average model.  Without going into the finite details of how my model works, I made the decision to incorporate multiple different shot types, goaltender ability, home ice advantage, and winning percentage which lead to a final score for each team.  The picture below is a “heatmap” showing how teams are performing against the mean in different statistics.  The data collection, even from just one website was intense and lead to one last road block- TIME.

I spent an hour or more gathering my data every morning before breakfast.  The time spent on data entry was taking away time from grinding an edge.  I visited the websites I used for metrics, copy/paste them into spreadsheets, and generated my implied probabilities for each match.  This left me with no time to find favorable Money Line bets or to improve upon what I had already built.  I was really struggling when on Episode 134 of the Deep Dive podcast, Andy saved my model. 

He stated data collection was set up to AUTO IMPORT from the internet.  I stated earlier that I have basic excel spreadsheet skills, but after listening to this podcast I had to figure out how to get the data refreshed from the web to my spreadsheet.  It took about an hour and a few nervous mornings to get the “Refresh” tab on my MacBook working like I wanted, however the work was worth it.  Now- it takes me 30 seconds to refresh data and another 30 seconds spent on formatting the new information to get my matches ready to go.  This step has freed me up to develop my model more and find early morning lines to start betting against. 

So, all of this had led me to where I am today- finalizing a hockey betting model to be prepared for the 2019-20 season.  I am currently backtesting my current model from Day One of the season through the full NHL schedule to determine if the model can be a success or where I need to improve it.  The results have been pleasing after the first three weeks of the season, I am showing that blindly betting matches with a 2% edge on has me up units and CLV.  Continued analysis of the results has shown me that my biggest edge is on Home Underdogs.  The testing has shown me I am not putting enough value on Home Favorites or too much on Road dogs- which if continues can help me weight different aspects.  This testing phase is the final step in model building.  You have to take the time, and I mean time, to evaluate your model against the line.  I built my model in season, so I have gone back to the start for my back testing. 

I am now more certain that my model can work, but also can be improved.  I continue to read and ask questions (ask Andy and the Whale) about different aspects of gambling.  I record different aspects to investigate as I test to improve my betting strategy.An example- I recently bet on the Canadians as the finished up a road trip in Anaheim.  After I lost my unit, I read that Montreal has a terrible road record on the West Coast.

Hockey_9.jpg

That’s a problem, but with any problem, there is a solution.  I downloaded the historical data on every game result in the NHL since the 2008 season.  A quick pivot table would have shown me that betting on Montreal playing in California was a bad idea no matter the edge.  The bonus, at the cost of the unit I lost and a couple hours this weekend, I added to my historical data base.

One last word, model building/refining is never done.  I am working on new sources to gain an edge, even while conducting a back test of my current model.  I know that teams playing back-to-backs or go on extended road trips are affected as the voyage continues.  I am aware that I do not have a good way to evaluate an injury and how it will affect a team’s chances of winning on a particular night.  I spent a few hours researching player data and stumbled on a site that tracks each skaters Goals Above Replacement.  Using this, I can start to factor this in to development of my implied edge.  

Hopefully I have been able to help by describing my journey in model building.  In no way do I want to portray myself as a sharp or a model building expert.  However, I am doing things I never thought possible in the sports betting universe.  I am not just looking at an NHL schedule and picking the Maple Leafs to win every night.  I am working to be a prepared sports bettor because you have to work to beat the book.  Most importantly, I am enjoying the interaction with others on social media, learning more about a sport I enjoy, and already trying to figure out how to improve my edge for next season.