Navara wins Czech Championship with 8.5/9 points

by ChessBase
5/11/2010 – After eight rounds top Czech GM David Navara still had a clean, 100% score and a performance rating in the 32 hundreds.This did not seem realistic and was recalculated using a proposal by computer scientist Ken Thompson. In the final round David generously drew a game and thus obtained a proper Elo performance: 2963. But the debate continues: how should we evaluate 100% or 0% results?

ChessBase 17 - Mega package - Edition 2024 ChessBase 17 - Mega package - Edition 2024

It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.

More...

Ostravský Koník 2010 is the Czech men's and women's chess championship in the city of Ostrava, the third largest city in the Czech Republic, from May 1st to 9th, 2010. Top Czech GM David Navara started with a sensational 8.0/8, and in the final round drew with black against GM Tomaš Polak to finish on 8.5/9 and a 2963 performance. Here are the top standings at the end of the tournament:

# Name
Ti.
Rtg.
Pts.
TB1
TB2
Perf.
1 Navara David
GM
2718
8.5
41.0
52.0
2963
2 Polak Tomaš
GM
2525
6.5
40.5
52.0
2634
3 Šimaček Pavel
IM
2518
6.5
38.0
50.5
2608
4 Hraček Zbyněk
GM
2632
6.5
37.0
49.5
2624
5 Bernašek Jan
IM
2495
6.5
37.0
47.0
2579
6 Rašik Vitězslav
IM
2480
6.0
37.5
49.0
2570
7 Krejči Jan
IM
2455
6.0
37.5
48.0
2558
8 Štoček Jiři
GM
2593
6.0
36.5
46.0
2494
9 Votava Jan
GM
2587
6.0
35.5
45.5
2505
  Petr Martin
IM
2511
6.0
35.5
45.5
2504
11 Haba Petr
GM
2513
6.0
34.0
44.5
2495
12 Zpěvak Pavel
IM
2401
6.0
33.0
44.5
?
13 Ponižil Cyril
FM
2392
6.0
32.5
42.0
?


The winner and Czech Champion: GM David Navara with 8.5/9

A note on 100% scores and rating performance

In our previous report David Navara had scored 7.0/7, which was evaluated by most rating calculations – including the one built into ChessBase 10 – as a 3241 performance. This is obtained by taking the average Elo of the seven opponents (2441) and adding 800 points to it (= 3241). International Arbiter and retired Professor of mathematics Albert Frank noted that if Navara's score had been 6.5/7 (92.86%) his performance, according to the original Elo table (Where he did not consider 0% or 100%), would have been 2441 + 415 = 2856. "We see that a difference of 0.5 point transforms to a difference in performance of 3241 – 2856 = 385 Elo points, which is enormous."

This seems unpractical and unrealistic, and we discussed what to do about rating performance in the case of 100% (or 0%) scores. The well-known computer scientist Ken Thompson advised us to throw in a draw by such a player against himself. "It gives more realistic numbers and rewards lesser rated players less than higher rated players," Ken wrote. It is indeed a logical algorithm, and since one would expect an even score when playing yourself, it seems well founded in theory.

Albert Frank confirmed that Thompson's idea was "excellent and could be used everywhere." We have decided to implement the proposal in the next version of ChessBase.

Using Thompson's system on the round seven result we calculated as follows:

  • Navara's rating: 2718
  • His opponents: 2303, 2401, 2479, 2489, 2419, 2518, 2480, 2718 (himself)
  • The average rating of his opponents: 2475.875
  • His score: 7.5 out of 8 games (including the one against himself)
  • Performance: 2946.

This is more realistic than the 3241 rating estimate obtained by the current system after seven rounds. Albert Frank did his own simulation, using the Thompson method and the original Elo tables:

This yields a rating performance of about 2476 + 435 = 2911. The difference between the two results is because Thompson does not use the old tables but has calculated much more precise ones by integrating the normal curve. He used these in the generalised ratings calculator he developed for the PCA and Intel back in the 90s. This rating calculator is already built into Fritz (and allows you to rate historical games very nicely).

Other opinions

Angelos Yannopoulos, Athens, Greece
In order to make this new algorithm fair, I think it would be better to include a draw with oneself being included in ALL players' evaluations, not just as an ad-hoc solution for players with 100% wins. This would make new ratings non-comparable to old ones, but the new ratings would be more fair compared to each other, and also in the presence of 100% win performances.

Anonymous, Germany
Ken is a genius – name this process after him! But first do new calculations of performance rating for a lot of historic games. Also this "game against myself" should be always added, and not only with 0% or 100% scores.

Ray Cornish, Derby, UK
The boundary of the expected result is most important. If you have scored 7 out of 7, then you have obviously scored above 6.5; but every one above a certain rating level would have an expected performance that exceeds 6.75. The same applies to trying to rate a score of 2 out of 2; you have exceeded a score of 1.5, but there will be a (calculable) rating level that would predict a score at or above 1.75. To illustrate, with an expected score at or above 1.75/2, this % score is 87.5%, so the expected minimum performance = +325 elo over the average of the opposition. For 6.75/7, the score is 96.43%, with a minimum performance = +510 over average of opposition. As you play more games and stay at all wins, the percentage for your [max. score - 0.25]/[max. score] gets slightly better each time, as the expected rating to achieve such a high % becomes more demanding.

Julio Gonzalez-Diaz, Santiago de Compostela, Spain
I wanted to briefly tell you about a related issue I have worked on with some colleagues for quite some time now. First, some quick preliminaries about myself. I am a mathematician and I work as a researcher on game theory. On the other hand, I am a chess fan (my FIDE rating is 2288).

My colleagues and I have developed a "refined" version of the usual performance that, from our point of view, is at least as suitable to measure players performance than the usual performance. We have called it "recursive performance" and it can be used both to evaluate how player have performed in a tournament and as a tie-breaker. Actually, it has already been used as the tie-breaker in a good number of international opens here in Spain (the strongest of them with over 30 titled players: San Sebastián Open).

In a nutshell, the motivating idea is the following. Suppose that you have a tournament with a 2800+ player, namely A, who performs awfully during the tournament, with an effective performance of 2000. In this case, to compute the performances of the opponents of player A, he still counts as a 2800+ player. We think that this should not be the case and that A's real performance in the tournament should be used instead to compute his opponent's performances. The recursive performance is defined building upon this idea. You can find a brief explanation of the recursive performance on my web site.

When I read in the article on Navara's performance that you are planning to include a modification in the computation of the performance, I thought that you might be interested in the recursive performance as well. We have developed a program to evaluate it and we would be more than happy to collaborate with you if needed.

Anonymous, USA
For a perfect performance, why not simply give as a performance rating the lowest rating for which a perfect performance would be expected, given the opponents' ratings?

Giuliano Ippoliti
I definitely don't like the new algorithm. It's surely better than the older, but what about if the rating of the player is unknown? Performance should not depend on your own rating. I prefer just saying that the performance with a score of N/N is greater than the performance calculated with a score of N-0.5/N.

Frank McFadden, Annadale, USA ("another math/stat guy")
Ways to deal with 100% results: 1) Report estimated probabilities, based on the usual formulas; 2) Use an improved Bayesian formula based on a model that is meaningful when results are 100%. Note: even with (2), reporting via (1) is more scientific than performance ratings; however, performance ratings have psychological appeal to chess players; therefore, I would recommend reporting PRs when results are not 0/100, but probabilities when results are 0/100.


Links

The most important games were broadcast live on the official web site and on the chess server Playchess.com. If you are not a member you can download the free PGN reader ChessBase Light, which gives you immediate access. You can also use the program to read, replay and analyse PGN games. New and enhanced: CB Light 2009!


Reports about chess: tournaments, championships, portraits, interviews, World Championships, product launches and more.

Discuss

Rules for reader comments

 
 

Not registered yet? Register