Some ruminations on “form”

In our latest Sinquefield Cup predictions, we listed co-leader Levon Aronian as having a 17% chance of winning the event. A fellow poster on a chess forum I frequent remarked that it was “bizarre” that his odds were so low, particularly in light of his “current tactical brilliancies”, and estimated that Aronian’s chances are probably closer to 30%.

It’s worth reiterating that our model does very little to attempt to account for a player’s “form”. We do use live ratings updated round by round, so Aronian is getting 11 rating points worth of credit for being in better form than we originally thought based on his pre-tournament rating. However 11 points is not a tremendously large adjustment. We have Aronian at only 17% despite sharing the lead because he shares that lead with Carlsen, who is rated almost 83 points higher than him. Over four rounds, this leads to a significantly increased score expectation, particularly since Carlsen also faces an easier remaining schedule: they both still have to face Nakamura and Anand, but Aronian’s other two games are the two strongest players (Carlsen and Topalov) while Carlsen’s other two games are Aronian and Grischuk.

On top of this, we can add in Topalov at just half a point back, also with a relatively easy remaining schedule, and equity for some longshots, and Aronian’s 17% odds make sense – if we stick with the assumption that 2776 is an accurate assessment of his playing strength.

We are talking about someone who has been rated as high as 2830 though, and who spent a long time as the #2 player in the world and presumed greatest threat to Magnus’ reign as champion. What if he is “back in form” and we can validly give him a higher estimated playing strength? If we keep everything in the model the same, except that we bump Aronian’s rating up to 2820, his odds increase to about 27%. We can get his odds to 30% if we make his rating 2832.

So if you say “I think Aronian’s odds are 30%, not 17%”, you’re not necessarily disagreeing with the structure of the model, you’re just saying that you think Aronian is in good form during this event, and expect him to play at an effective strength of 2832 over the final four rounds. This isn’t particularly absurd, he certainly could do so.

Statistically, we are reluctant to give too much credence to the idea of large variations for “form”. Most such phenomena can be explained by random variance alone, and throughout sports a “hot streak” or someone being “clutch” or “in the zone” are generally just falseĀ narratives we throw around because they sound more interesting than “he got lucky”. In chess specifically, we apply this idea by assuming that a player’s live rating is the most accurate estimate we have available of his playing strength (it accounts for the most available data, after all). That said, not every player will always be accurately rated at any given time.

Since we’ve just established that unexpectedly good or bad results can happen in the course of normal statistical variance, we have to also grant that when those results happen to a player whose rating was previously accurate, that rating will get thrown a kilter. Aronian’s rating plummeted through some very unexpectedly bad recent events. The proper Bayesian response is to factor those results in as additional data, on top of the results that initially got him the high rating in the first place, and re-evaluate him less favorably. This is what we’re doing when we use his current rating in our simulations. It’s also possible, though, that his older results, that brought his rating up over 2800, were accurate reflections of his underlying true long-term playing strength, and then the recent bad results were purely random variance and not reflective of a drop in his abilities. He might still be a 2800, or even 2820, strength player, and the uptick in his rating from good results so far in St. Louis might be regression to the mean, as his currently-too-low rating corrects itself through further random chance.

Of course Magnus Carlsen also entered this tournament coming off a bad result, and if Norway Chess was pure variance then perhaps he too is truly underrated. Maybe his real odds of winning are higher! And every other player may or may not be accurately rated as well! We can’t really be sure, so for the model’s sake we will continue to just use the live ratings and let the variance sort itself out over time.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s