How did Quintessa’s Sports Rating Algorithm Perform at Euro 2024?

14 Aug 2024

After an exciting Euro 2024, it’s time to assess how Quintessa’s N-Estimates sports rating algorithm has performed.

Last month, Quintessa’s "N-Estimates" algorithm was busy predicting the results of matches at UEFA Euro 2024. Now the dust has settled, it’s time to analyse how the algorithm performed over four weeks of competition. Previously, each prediction was accompanied by a plot displaying all the possible scoreline probabilities with the most likely outcome embellished by a cross. These plots have been updated with a green scoreline representing the actual result. The final plots for all the matches can be found at the end of this news story.

When assessing the performance of the algorithm during the competition, we have considered three performance metrics:

the percentage of correct outcome (win/draw/loss) predictions;
the percentage of correct goal difference predictions; and
the percentage of correct exact scoreline predictions.

Figure 1 compares the algorithm’s performance over the whole of Euro 2024 against a benchmark in each of these metrics. Throughout this analysis, the predictions are compared with the scores at full time (including extra time if applicable; matches that were decided by a penalty shoot-out are counted as draws). The benchmarks used are calculated based on random selection from the distribution of results that occurred at Euro 2024; this approach ensures the highest benchmarks that could be obtained for this tournament from random guessing.

Metrics of algorithm performance across all the games played at Euro 2024. The data is displayed as a horizontal bar chart, with Percentage of Matches on the x axis, and Correct Outcome, Correct GD, and Correct Scoreline on the y axis. Across each category, percentage of matches is recorded for a Benchmark prediction, Quintessa’s N-Estimates, and Chris Sutton’s predictions.
For correct outcome, the Benchmark prediction performed the most poorly with 33% of matches correct. N-estimates and Chris Sutton were closer- N-estimates at 51% and Chris Sutton at 53%.
Again, for the Correct GD, the benchmark prediction was notably least accurate at 21% of matches correct, while N-estimates and Chris Sutton both made better predictions at 34% and 29% respectively.
The plot shows that predictions for Correct Scoreline were generally worse than other categories. Benchmark predicted 11%, N-estimates predicted 16%, and Chris Sutton predicted 18% correctly. — **Figure 1:** Metrics of algorithm performance across all the games played at Euro 2024.

The algorithm performed significantly better than chance. The correct outcome was predicted in 51% of matches, approximately 18% higher than would be expected by chance. The correct goal difference was predicted in 33% of matches (compared to the 21% benchmark) and the exact scoreline was correctly predicted in 16% of matches (compared with the 11% benchmark).

Following the Qatar World Cup in 2022, we compared the algorithm’s predictions with the expert judgement of BBC pundit Chris Sutton. Chris and the algorithm tied for the number of correct outcomes and scorelines predicted, with Chris beating the algorithm by one correct goal difference prediction. At Euro 2024, the algorithm again gave a similar performance to Chris’ predictions. This time, Chris just edged the outcome and exact scoreline categories, with 27 correct outcomes (against the algorithm’s 26) and 9 correct scorelines (against 8 for the algorithm). However, N-Estimates won the goal difference category by two matches (17 compared with Chris’ 15). Perhaps we’ll call this one an honourable draw!

Links to Chris Sutton’s predictions

There is a large degree of inherent variability in the outcomes of football matches. For each match, our predictions included calculated probabilities for various scorelines. We can use these to calculate the percentile within the distribution of predicted results that contains the actual match result, for a given match. Plotting the cumulative distribution of these percentiles allows an assessment of how well (or otherwise) the variability of the predictions matches that of the real-life results. Such a plot is shown in Figure 2.

If the distribution of predictions exactly matched the distribution of observed match results, the cumulative distribution would follow the black dashed line in the figure. The blue line shows the cumulative distribution of all matches at Euro 2024. It is close to the black line, but lies slightly below it, indicating that the algorithm marginally overestimated the variability in the results. In particular, the algorithm predicted the exact scoreline more often than expected, and there were comparatively few matches in which the observed result had been given a low probability by the algorithm, exactly as we would want from the algorithm!

Cumulative distribution plot, with Percentile on the x axis, and Percentage of Predictions Below Percentile on the y axis. Both axes increase in increments of 10 from 0 to 100. Idealised expectation is a black dotted line on y=x. Predicted percentiles is a blue line, and always lies below idealised expectation. — **Figure 2:** Cumulative distribution of predicted percentiles (blue line) and idealised expectation (black line).

June 14 2024, Germany vs. Scotland. Central normal time prediction: 2 - 1. Confidence range for goal difference (Germany minus Scotland): 0 to 5. Actual normal time result: 5 - 1. — **Figure 3:** Predictions and actual scores for every match in the tournament. For each plot, the circles represent possible final scores, with the number of goals scored by each team plotted on the axes. Each circle has been colour coded to indicate the probability of that result occurring, with the most likely outcome marked with a black cross, and the actual outcome marked witha a greeen ring. The dashed black line indicates a goal difference of zero.

June 15 2024, Hungary vs. Switzerland. Central normal time prediction: 1 - 1. Confidence range for goal difference (Hungary minus Switzerland): -2 to 4. Actual normal time result: 1 - 3. — **Figure 3:** Predictions and actual scores for every match in the tournament. For each plot, the circles represent possible final scores, with the number of goals scored by each team plotted on the axes. Each circle has been colour coded to indicate the probability of that result occurring, with the most likely outcome marked with a black cross, and the actual outcome marked witha a greeen ring. The dashed black line indicates a goal difference of zero.

Quintessa is not affiliated in any way with FIFA, UEFA or the BBC. Its application of the N-Estimates algorithm to UEFA Euro 2024 is an independent and non-commercial endeavour.