For the past couple days I've been working on a model that gives the odds of each driver winning the IndyCar championship. In this article I'll do my best to explain how it works and what it means for the title race this season.
The basis of the model is the probability of each driver finishing in each position. Using data from the current season, I found what the odds were that Power would finish in first, second, third, etc. I did that for all drivers with a chance of winning the championship with two races remaining. I chose to only use data from this season as a way of 1) accounting for momentum and 2) not diluting the model with data from a prior season when 14 races have already been run this year, which is a fairly good sample size for IndyCar. In the future I plan to include past season's data on a weighted scale, valuing the current season more than recent ones while still including them in the model.
As you might imagine, not every driver finished in every position this season (or even over the past couple seasons). But just because Power hasn't finished in 13th yet this season, that doesn't mean he should have a zero percent chance of finishing there in the upcoming races. The way I deal with this is by dividing race finishing places into different groups. Finishing probabilities for first, second, and third are all calculated by themselves right now, meaning a driver's probability of coming in second place in a race truly is his second place finishing probability. Positions four and five are grouped together, as are positions 6-10 and 11-22. If a driver has a 10 percent chance of finishing in fourth in a race and a five percent chance of finishing in fifth, the model will say he has a 7.5 percent chance of finishing in both of these positions instead. The same process is done for the other groups. Since these position groups offer similar points, the solution is workable.
Another reason the grouping system is important is because finishing in the top three positions require more skill than other positions. The difference between fourth and fifth is based on a good amount of luck instead of skill. That factor of luck is compounded even more in the 11-22 position range. There are so many factors that go into finishing a race that discerning any skill difference in finishing 13th and 17th is almost impossible. Average drivers can finish in the top ten, but it's very hard for average drivers to finish in the top three.
Now that the model knows the probability of the drivers finishing in each position, it can simulate the remaining races on the calendar 10,000 times and see how the championship plays out. Point results for each race are added up in addition to the points drivers already have. There's also an adjustment and bonus points awarded for getting pole position, leading a lap, and leading the most laps. After that it's simple math. The model counts up the number of times each driver wins the championship, divides it by the number of simulations run, and spits out a probability of winning the championship.
After each race is completed, the model is updated with the actual points standings, the latest finishing place probabilities, and then runs a new set of simulations with updated championship probabilities.
There are a few things I plan of adding to the model in the future. First, as I mentioned before, I will be adding in past season's data, weighted according to how recently the race took place. This will help fine tune the model in regards to the finishing group percentages. I can't see ever doing away with the finishing groups completely because once you had enough races to have an accurate sample size for each position on the track (thus not needing the finishing group setup anymore), the results being used would be so old it wouldn't be telling you anything that is predictive of the future.
I also plan to add a track adjustment in the future to account for drivers who are better on certain tracks. If not a new adjustment for each track, there will definitely be a road/street/oval adjustment.
The current championship probabilities can be found here. I'll do article updates before and after races tracking how the championship hopes have changed for different drivers.