The World Cup is a 128 player knockout tournament in September of 2015. It is a component of the World Championship Cycle, as the top two finishers will earn spots in the eight player 2016 Candidates Tournament, where the winner will go on to face Magnus Carlsen in the 2016 World Championship match. Each round consists of two classical games, and in the case of a tie two rapid games, and if still tied two faster rapid games, and if still tied two blitz games, and if still tied one final Armageddon game to determine which player advances to the next round. Additional factual details on the event, structure, and field can be found in the excellent Wikipedia article on the event. At this time, my player/seed/rating list comes from that page.
The complete field of 128 qualifiers became official on August 14th, so we can begin to project the results, and estimate various players’ chances of reaching the finals, or of winning the entire match. These projections are based on the players’ ratings on the August rating list, and assume that all 128 players will compete and be seeded in this order. Of course that will shift before the event actually begins; our final projections will use live ratings, final seed order will be determined by the as-yet unpublished September rating list, and in a field of this size it’s relatively likely that a few players may end up unable to make it to the event and replaced by alternates. Nevertheless, enough information is now available that we can run it through our model and give you some early predictions! For details on the methodology, scroll to the bottom, below all the listed odds.
We have reached the finals, and all our previously published odds on who will reach this last round are obsolete. The finalists are Sergey Karjakin and Peter Svidler. Based on his 23 point edge in the live ratings, we estimate that in the 4-game classical match (new format for the final round), Karjakin should win the tournament 59.9% of the time.
These odds are entirely probabilistic. Unlike other tournaments, where too many possible results exist, and we are forced to result to Monte Carlo simulations to estimate the odds, the single elimination structure of this event allows us to calculate odds directly that are as precise as our underlying assumptions can allow. We developed a table that looks at the rating differential between the two players, and estimates each player’s odds of winning the mini-match and advances to the next round. In later rounds for any given player we look at their odds of reaching that round in the first place, all the possible opponents they could face if they get there, the relative odds of facing each of those opponents, and the odds they would defeat each of those opponents in a potential matchup. Ultimately, this means that if our underlying estimate of match odds were completely accurate (of course it isn’t, and can’t be) that all other extrapolated percentages would be perfectly accurate probabilities as well. We don’t have the “not enough simulations” error source that most of our other predictions have to avoid.
What match odds do we assume? And why aren’t they “completely accurate”? Well, our odds are based exclusively on players’ standard ratings. Even though FIDE publishes rapid and blitz ratings, and those time controls will come into play frequently throughout the tournament, we ignore them. This is by design; although some players have played a lot of rated games at the faster time controls, many others have not. In the long run, throughout the entire field, we suspect that standard ratings (which almost always have a strong sample size for active players, and correlate well with “chess ability”, which applies at any time control) probably have more predictive value than the often higher variance rapid and blitz ratings. In particular, rapid ratings are suspect. Rated blitz tournaments (both on their own, and as preliminary events before classical tournaments) have gotten popular, and many of the blitz ratings out there do have a good sample size, however most games at this time remain either blitz or classical. Rated rapid games remain few and far between. Since two pairs of rapid games are played before blitz games would occur, in the World Cup tie break procedures, blitz ratings may be accurate but have minimal impact on predictions. Rapid ratings are more important for predictions, but less trustworthy due to the small samples.
That being said, choosing to ignore the rapid and blitz ratings does mean we’re intentionally ignoring some meaningful data. We really can say with confidence, for example, that Fabiano Caruana is a weaker blitz player than his standard rating would suggest. In a perfect world, we would perform a Bayesian analysis and estimate the rapid and blitz playing strengths of every player in the field with a weighted average of their rapid/blitz ratings and their standard ratings, where we determine the weights based on the sample sizes (players with lots of rated blitz games under their belt would pretty much just use their blitz rating, players with few or none would have most/all of the weight applied to their standard rating). We don’t live in a perfect world, though, and it would be a major time consuming project to build a model in that structure that actually had merit. So instead we’re just using standard ratings only, and basing everything off of rating differential. It’s probably close enough.
Other sources of error in the analysis include our estimation of draw rates. We assume the draw rate in any given game is always the same, provided the time control and rating differential is the same. While we do of course assume reduced draw rates as the rating differential increases, we don’t adjust for specific players’ tendencies or for match conditions. We also assume (for computational simplicity) that every game is fully independent. In reality, the odds in game two almost certainly vary based on the results of game one: if game one was decisive, and the winner can “play for a draw” to advance, while the loser must “play for a win” to stay alive, then logically the odds of various results are very likely different than they would be if both players “just played normally”. Exactly how the odds shift (are draws more likely? less likely? is the player “pressing for a win” going to lose more often than normal?) is complex, and again would require a detailed study to accurately model. Instead, we’re just choosing to go with the simplistic assumption, and accept that our probabilities aren’t “perfect”. Here is a graph of the odds we’re using:
And here they are in a data table:
|ELO Difference||Odds of Advancing|