Analytics, machine learning predict World Cup scores

The 2018 FIFA World Cup kicks off tomorrow in Russia with the host nation taking on Saudi Arabia.
The 2018 FIFA World Cup kicks off tomorrow in Russia with the host nation taking on Saudi Arabia.

South African-based data scientists at Principa are at it again; this time using predictive analytics and machine learning to foretell the results of the 2018 Football World Cup.

The 2018 FIFA World Cup kicks off tomorrow in Russia with the host nation taking on Saudi Arabia in Group A. Principa has already predicted the results for all the first games in the first round of matches.

The company's data scientists use different algorithms to develop models that can predict the outcome of the matches.

Principa notes that as the objective of machine learning is to develop models that can retrain themselves to adapt when exposed to new data, the algorithms will be re-trained with the results of each match to improve the accuracy of the following round's generated prediction.

It points out that the purpose is to see how well different predictive analytics techniques used successfully in other areas can outperform the best human-made predictions. The exercise will take place on SuperBru.com.

In 2015, Principa predicted the results of the Rugby World Cup matches, out-predicting 99.68% of humans on the same sports predictor platform, SuperBru.com. The Principa data scientists also experienced success when predicting the outcomes of the 2016 Oscars.

"It will be interesting to see how accurate our models are in predicting the outcomes of the football matches," says Jaco Rossouw, CEO of Principa.

"We're cautiously optimistic after our previous success in predicting the Rugby World Cup, but this is a whole new ballgame. We've never used our skills as data scientists to predict the outcomes of a football game, and unlike with the Rugby World Cup where we were predicting the point margins between the participating teams, this time we'll be predicting the exact final scores; a significantly more complex challenge. We're excited to see how the algorithms perform."

According to Rossouw, because there is no one analytical model that is 100% effective in predicting sports, Principa will run three models based on different techniques to test the effectiveness of each.

"For the first round, we'll be selecting the predictions with the highest historic accuracy based on our back-testing results and for each round, selecting the technique which had the best performance in the previous round."

Prediction techniques

The three techniques used are Bayesian inference, Poisson regression and multinomial logistic regression.

Francel Mitchell, head of decision analytics at Principa, explains that the Bayesian inference technique can be used to enhance predictions by using what we already know (determined by looking at historic game results), with a recent sample of data to predict the likely outcome.

"In this way, recent performance and player statistics are used to enhance the predictions of models that are developed on historic data alone," she says.

She adds that a multinomial logistic regression model is merely an extension of a binary logistic regression model as it allows for more than two classes of the dependent variable.

"We will use a method of variable selection to choose which variables are significant in predicting the dependent variable, and that would be our independent variables for our model. The model will then give us the probabilities for each class (or goals scored).

"If we repeat this for the opponent team, we can logically arrive at the score of each team by choosing the class with the highest probabilities for each run of our multinomial logistic regression model."

Mitchell notes the Poisson distribution is a probability distribution that can be used to model data that can be counted, like the number of goals scored in a football match.

"This means we have a method of assigning probabilities to the number of goals in a game and from this, we can find probabilities for different match results. To be able to find the probabilities for different numbers of goals, we would use the regression method, based on certain variables, such as the strength of the attack, ratings of the team, etc."

Principa's predictions:

Group stage

Russia vs Saudi Arabia: 2-0, Russia win
Egypt vs Uruguay: 1-2, Uruguay win
Morocco vs Iran: 1-0, Morocco win
Portugal vs Spain: 1-2, Spain win
France vs Australia: 2-0, France win
Argentina vs Iceland: 2-0, Argentina win
Peru vs Denmark: 1-2, Denmark win
Croatia vs Nigeria: 2-1, Croatia win
Costa Rica vs Serbia: 0-1, Serbia win
Germany vs Mexico: 2-1, German win
Brazil vs Switzerland: 2-0, Brazil win
Sweden vs South Korea: 2-1, Sweden win
Belgium vs Panama: 4-0, Belgium win
Tunisia vs England: 2-0, England win
Colombia vs Japan: 1-0, Colombia win
Poland vs Senegal: 1-0, Poland win

This round's prediction has been made with Bayesian inference.

Have your say
Youtube play icon