I was invited to join Squiggle, an online AFL ladder where mathematical models are compared using three metrics: correct tips (accuracy of win/loss prediction), margin mean absolute error (accuracy of margin prediction), and bits (accuracy of predicting the correct probability of winning). Essentially, we want to maximise correct tips, minimize margin mean absolute error, and maximise bits. I’m super excited to join the Squiggle competition, and can hopefully not come last in everything ;). The mathematics of Bits can be found here, but essentially they’re a measure of how accurate your model is at predicting the probability of a team winning. For example, if Richmond and Carlton were to play each other 100 times, you are trying to predict what percentage of games Richmond would win. Whereas tipping doesn’t account for this (simply a measure of predicting the winning team, but does not take into account the “confidence” in a tip).
What I have discovered recently is a few things. That my data contained duplicates of games, and it also didn’t contain all games from every season. After correcting for this, my model predictions have improved quite significantly. I also feature scaled each player’s stats over a game then took the average for both teams, instead of feature scaling over the entire sample. This was done in an attempt to increase the accuracy of who performed better out of two teams in a particular game.
Creating and Comparing Models
We left off with a linear regression model to predict margin, and a logistic regression model to predict the probability of winning. I decided to compare the predictions of both models, as well as a simple support vector machine for classification, using a radial basis function (Gaussian) kernel.
Our entire data set contains the previous 3000 games played (approximately from 2004 to 2019). I split the data set into two subsets, a test set (20% of the most recent games, i.e. all games from approximately the beginning of 2017 to the present) and a training set containing the remaining games (80% of the original set). I then used a shuffle split method to optimize hyper parameters for each model, where for each hyper parameter value, the accuracy was measured by averaging over 100 shuffles. This resulted in consistent hyper parameter values for each model. Note: Technically, whilst optimizing hyper parameters, I split the training set into two disjoint sets for each iteration, one to fit the model (75% of training set size), and one to cross-validate (25% of training set size).
From the linear model, the probability of winning was calculated by assuming that the residuals are normally distributed, with the mean set equal to the expected margin, then simply calculating the CDF to give the probability of winning.
See Table 1 below for comparisons between models. Note: The 2019/18 seasons lie in the test set, and are disjoint from the training and CV sets.
We see that the logistic model is slightly better than the linear model for both tips and bits. The SVC model appears to be only slightly better at tips (66.7%) on the test set than the linear and logistic models (66.2% and 66.5%, respectively), but significantly worse when it comes to bits.
|Model||Bits Test/Train||MAE Test/Train (Margin)||Tips Test/Train (%)||Bits 2019/18||MAE 2019/18 (Margin)||Tips 2019/18 (%)|
What is interesting to note, is that the MAE of the margin in the test set is slightly smaller than the training set. I wouldn’t expect this, but I’ve looked over the code a bunch and I’m confident the training and test sets are disjoint. This could possibly be due to MAE not being a good measure of the models accuracy, or the test set size, or simply (as pointed out by MoS) that some years margins are more predictable overall than others. Here’s an updated figure of the 2019 AFL Season predictions.
I think from these comparisons, it is optimal for me to use the linear model for my margin prediction, and logistic model for probability of winning prediction in order to maximise bits. I will update my predictions using these two models now. It’s also interesting to note the performance of the SVM, and I’ll look into possibly improving it along with looking into other models.
Until next time. Have a peaceful day 🙂