Skip to main content

Poisson Regression Analysis

Our analysis utilises generalized linear models (GLM) to calculate match odds from existing football data sets. Specifically we use Poisson regression techniques to try and model the predicted home and away goals scored for a defined match. This approach in itself is not new and has been discussed extensively in statistical literature. As it models goals directly the system has the flexibility in predicting most of the common betting markets (Correct Score, Half Time / Full Time, Asian Handicap etc). Our approach is an improvement on some of the early models, achieved by changing the regression equations and adding further data into the analysis (such as shots on / off target, crowd attendance and motivational factors)

Basic Regression Equations

The below is a basic example of the equations we use, it does not include all the factores that the production model uses. In the equations below Xi,j and Yi,j are the number of goals scored by the home team(i) and away team (j) respectively, α represents a parameter describing the ‘strength’ of the teams and γ represents a parameter describing the home advantage of team i.

  

This model is augmented with the inclusion of total shot data. To achieve this we simple extended the above equation with the assumption in equation below, where κ is a scaling factor.

And therefore

where Ai,j and Bi,j are the total shots (on and off target) in a game by the home and away teams respectively. In this model we are modelling a match as a combination of goals scored in a match and the goals that should have been scored inferred from the shots data.

The regression includes a weighting function (equation 2.2) which allows the model to place greater importance on more recent matches. We have included the same weighting function. In our model t=(fd-md) represents the difference, in days, between the date the match was played (md) and the date chosen to represent the “fit date” (fd). ξ is a constant representing the strength of the decay

To allow us to vary the relative importance of “goals” inferred by the shot data to the actual goals scored we have included a further weighting function to fit to those relevent elements. This constant, τ, is fixed at 1 for elements representing goals and a lesser value for those elements in the fit representing shots.