AFL GO (version 1)

Introduction

Hey there! I’m Charles and I’m currently completing my Honours specialising in Astrophysics & Astronomy at the Australian National University, where I completed my BSc majoring in Mathematics and Theoretical Physics. Over the summer of 2019/20 I’ve been studying Machine Learning in my spare time and I thought it would be nice for one of my first projects to be modelling the outcome of AFL games. Over the past number of years I’ve briefly followed the AFL modelling league Squiggle, and ever since I have been very impressed with how smart and clever everyone on there is. The name of my model(s) is AFL GO (AFL Gadget-type Operator), taking inspiration from the big BT.

Anyhow, let me explain my goals and what I have done thus far.

The goal of this project is to begin “small” and work up. I plan to begin by using a linear regression model to predict margin (difference between home team and away team score), then logistic regression to predict outcomes (win/lose), and eventually using neural networks. At the present moment I have only created a linear regression model to predict margin, and from the same model somewhat poorly predict the probability of winning/losing.

Linear Regression

In order to create an “accurate” linear regression model for margin prediction, one requires “enough” historical data points that includes all the features/inputs (i.e. all measurements we will use to predict a margin e.g. team rating, recent performance, etc) where we will denote a single input sample as x^{(i)} = (x_0^{(i)}, x_1^{(i)}, \ldots, x_n^{(i)}) (where x_0 is the bias variable), and the corresponding margin y^{(i)} = (Home Team Points) - (Away Team Points).

Historical Data

All of the historical data used in creating this model originated from AFL Tables and was obtained using fitzRoy (this person is a legend), and contained the statistics of every player for every game played between the year 2000 to the present (\approx 3000 games). There’s nothing particularly special about this time period, it was simply chosen since I wanted to maximise the amount of data that I had, and because some of the features that I chose weren’t available from years prior to the year 2000.

Features

Choosing the features/inputs to use seems somewhat hand-wavy, and it would be very ideal if I had an AFL professional explain to me what they believe are important features in determining the margin of a game. Since I don’t have that privilege, I’ll make some hand-wavy guesses of some important features:

  • Home ground advantage.
  • Long-term rating of a team.
  • Short-term win/loss performance.
  • Short-term team performance in particular statistics (attacking, defending, etc).
  • Distance traveled between games.
  • Number of days between games (recovery).
  • Bye–round (breaking momentum?).
  • Time spent playing together (chemistry).
  • Short-term individual performance.

The n=9 (relatively simple) features x=(x_0, x_1, \ldots, x_n) that I have chosen to begin with are the following:

  • (1) Difference between a rolling Elo rating over the past 20 games (long-term rating).
  • (2) Difference between the number of wins over the past 5 games.
  • (3-8) Difference between the average team statistic over the past 5 games in the six distinct statistics (scoring, contested, uncontested, defense, possession, attack).
  • (9) Home ground advantage (absorbed in the bias parameter \theta_0).

where the “difference” is the home team statistic minus the away team statistic.

Features 3 through to 8 are the difference in the average team based statistics over their previous 5 games. They are functions of the following:

  • Scoring = Goals, Behinds, Goal Assists
  • Contested = Hit outs, Tackles, Contested Possessions, Contested Marks
  • Uncontested = Uncontested Possessions
  • Defense = Rebounds, One Per-centers
  • Possession = Handballs, Marks, Kicks
  • Attack= Inside 50s, Clearances, Marks inside 50

Some of these are scaled appropriately according to the Supercoach Rating formula used for players.

Elo Rating

An Elo rating of a team is simply a measurement of how good that team is based on the outcomes of their previous games (win, draw, or loss) and the Elo rating of the corresponding opponent.

For example, say Team A and Team B have Elo ratings E_A and E_B respectively. Before the game is played, the predicted outcome for team A (denoted by P_A, also known as the probability of team A winning) is calculated using the logistic function, i.e.

P_A = (1 + 10^{(\frac{-(E_A - E_B)}{400})})^{-1} \in (0,1)

and the probability of Team B winning is given by P_B = 1 - P_A. So, if the rating of Team A (E_A) is much larger than the rating of team B (E_B) then the probability of team A winning (P_A) approaches 1. Similarly, if the rating of team A is much smaller than the rating of team B then the probability of team A winning approaches 0.

In this example, the two teams play one another, and by the end of the game some result occurs for Team A and Team B (either a win = 1, draw = 0.5, or loss = 0), denoted by R_A and R_B = 1 - R_A respectively. The two ratings of Team A and Team B are then updated

E_{A, new} = E_A + k(R_A - P_A)
E_{B, new} = E_B + k(R_B - P_B)

where k is a parameter, which will be optimised in the future, but for now is set to k=20.

Initially, all teams have the same rating of 1500 and for each game played between two teams, their respective Elo rating’s are adjusted. As more games are played, the Elo rating’s approach their “true” value. If we’re trying to use Elo as an accurate measurement of a team’s long-term rating, if too few games are played then the ratings will not be accurate due to a lack of playing all teams, and if too many games are played then the rating will not be accurate since team lineups and performance change over several seasons. Due to this, for any given game in our historical data, the Elo ratings of both teams are calculated over their previous 20 games (varying this parameter will be investigated in the future).

Cost Function and Optimisation

The cost function J(\theta, \lambda) chosen for optimising our parameters is given by

J(\theta, \lambda) = \frac{1}{2m} \sum_{i = 1}^{m} (h_{\theta}(x^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_{j}^2

where m is the number of sample points, x^{(i)} \in \mathbb{R}^{n+1} is the i-th sample input, y^{(i)} \in \mathbb{R} is the i-th sample output, \lambda \in \mathbb{R} is the regularisation parameter, and \theta \in \mathbb{R}^{n+1} is the parameter vector we want to optimise. The linear hypothesis h_{\theta}(x^{(i)}) is given by

h_{\theta}(x^{(i)}) = \theta_{0}x_0^{(i)} + \theta_{1}x_1^{(i)} + \ldots + \theta_{n}x_n^{(i)} = \theta^{T}x^{(i)}

Now, we have approximately 3000 games worth of data. I’m going to use 2500 in total (so that I can still calculate an appropriate Elo rating for each game). We will randomly assign all 2500 games to three sets: a training set (containing 1500 games), a cross-validation (CV) set (containing 500 games), and a test set (containing 500 games), such that any two sets are disjoint.

The training set’s purpose is to optimize the parameter vector \theta for a given regularisation parameter \lambda (who’s purpose is to prevent overfitting). The CV set is used to measure which combinations of (\theta, \lambda) produce the smallest error. The test set is used to measure how accurate our model (with optimal parameters) is on data it has never seen before. It must be kept in mind that since the data in each set is randomly chosen, if I re-randomise the sets, my model will optimize at different parameters but the change is basically negligible (I might discuss this in a future post).

So, given m data samples (x^{(i)}, y^{(i)})_{i=1}^{m} (i.e. each game has its features stored in x^{(i)} and margin y^{(i)}) we would like to choose the parameter vector \theta that minimises the cost function J(\theta, \lambda). Essentially, a classic calculus minimisation problem. It can be analytically solved quite fast using a computer since we have only a few parameters, but we do it numerically using gradient descent (I won’t explain it, but it’s described here).

We plot the learning curves for the training, cross-validation, and test set in the figure below. Without going into too much detail, since these learning curves converge it implies that our current model is void of over-fitting.

undefined

I’m actually not sure whether it matters whether or not these learning curves cross (I believe it’s simply by chance).

From the calculated parameters, I tested the model by tipping the team with predicted margin greater than zero, and it has a 71.3% accuracy on the training set (biased), 68.4% on the CV set, and 68.8% on the test set.

Margin Prediction and Probability of Winning

We now have the optimised parameters \theta and can use this model to predict a games margin simply by measuring the 9 features x^{(j)} mentioned earlier and plugging it into our linear hypothesis h_{\theta}(x^{(j)}) = \theta^{T}x^{(j)}.

In order to calculate the probability of winning, I can think of two possible methods: 1) Simply use the Elo rating we calculate for both teams and using the logistic function to calculate a prediction probability, or 2) Assume our data is normally distributed about our model i.e. has a mean \mu = h_{\theta}(x) and a standard deviation \sigma that can be calculated from our cost function (turns out to be \sigma \approx 39 points), and calculate Pr(M > 0) (where M is the margin). I’m honestly not sure which one I should choose in the meantime, so the probabilities for Round 1 on my Twitter are from using the Elo rating prediction.

In the future, I plan on using logistic regression with similar features in order to create a separate model to predict probability of winning.

Conclusion

Well, that’s about it. There’s so much I don’t know about statistics and Machine Learning, but I guess that’s why I’m doing this project. If you have any features that you think would be influential for my model please let me know! Very soon I should have my ladder prediction out (hopefully it looks somewhat beautiful) along with my weekly predictions. Thank you for reading and I hope you have a peaceful day 🙂