A Gross Miscarriage of Justice in Computer Chess (part three)
By Dr. Søren Riis
Evaluation: a tale of two programs
The core essence of the ICGA’s case against Rajlich is that Rybka and Fruit
have very similar positional evaluations, i.e. a chess program’s mathematical
assessment of which side is winning in a given position. Dr. Hyatt announces
in the Rybka forum that he is so confident that Rajlich has been so completely
and redundantly busted in this area that no further analysis is needed:
I suppose one could have gone into every bit and piece of Rybka to see
what was original and what was copied, but after the evaluation study, there
seemed to be little justification for the effort required...
The problem is that the ICGA’s findings on the evaluation similarities between
Fruit and Rybka tend to fall into one of these three descriptions.
- The evidence presented is untrue.
- The evidence is true but the conclusions are false and/or tendentious.
- The evidence is true and the conclusions are true, but put into proper context,
the conclusions are irrelevant or immaterial.
I’ll go over some of the ICGA’s specific charges in due course, but first it
is important to itemize the significant ways Rybka’s evaluation differs from
The first big evaluation difference between the programs is that they are grounded
on different valuation conventions. Fruit's evaluation is based on a programming
tradition going back decades which stipulates that a pawn is worth 1.00. From
its initial release Rybka’s evaluation was based not on piece values but projected
winning percentages. Per Rajlich, this mathematically subtle difference plays
a significant role in testing.
Ed Schröder reveals five additional major differences versus Fruit in his investigations:
“Lazy” evaluation is not in Fruit but is present in Rybka.
The programs have entirely different futility pruning approaches.
Fruit has only one evaluation array related to King Safety. Rybka has many.
The Fruit "quad" function calculates a value based on the rank
of a passed pawn in an unusual way, using up valuable processor time when
this can be done in just one instruction via a PST or rank-based table,
as is done in Rybka and many other programs.
Fruit evaluates in two steps, while Rybka directly adds up an evaluation
That makes six differences altogether: differences that are actually substantial
and impact playing strength, unlike some of the tangential issues discussed
in the ICGA report. But next Rajlich points out four more big differences
in this illuminating passage:
Fabien was a big search guy who basically didn't care about eval, but
he nevertheless really hit on a major point with his eval. His eval was based
completely on mobility. This has two nice properties. The first is that it
interacts well with the search – having the right to make more moves
tends to coincide with the chances that more searching will improve the score.
The second is that it's symmetrical and continuous, because all pieces are
basically handled in a similar way. Every piece has the right to move to certain
squares. This symmetry is elegant, eliminates discontinuities, makes the eval
smooth, helps with tuning, etc.
Re. Rybka 1.0 Beta vs Fruit 2.1 eval – I don't know the exact differences.
There must be a lot of little things. Generally I would say that Rybka 1.0
Beta had the following big eval innovations:
Material imbalances – Rybka was the first engine to understand
that major pieces are more valuable in endgames. See for example the game
Ikarus-Rybka from Paderborn 2005 – every engine except for Rybka
thought that white was much better in that NN vs RP endgame, while any
decent player would know that black is perfectly fine.
Passed pawns – passers are the other major exception to mobility
based evaluation. Rybka 1.0 Beta had quite a few heuristics for scoring
passers, I am quite sure again that these are far ahead of Fruit or other
engines from 2005.
Tuning – I had my own eval tuner which was kind of primitive
compared to what I have now, but nevertheless I think that it was better
than what Fabien and others had in 2005.
These are worth maybe 20 Elo each.
One other unusual feature of Rybka's eval from that time is that I tried
to have as few correlated eval terms as possible. I took this pretty far.
For example Rybka didn't score doubled pawns (I'm not sure about the exact
versions but I think this applies to Rybka 1.0 Beta). A doubled pawn penalty
is mostly redundant with penalties for a weakened king shelter or for the
inability to create a passed pawn, so Rybka would only score the underlying
issues (i.e. the king shelter and passed pawn creation). I later decided that
this was wrong, but anyway it's a unique feature of Rybka's eval from 2005.
The ten substantive evaluation differences outlined above, combined with Rybka’s
entirely different search and board representation, signify that Rybka and Fruit
must be considered two different chess engines by any reasonable person. These
differences go a considerable distance to explain why, per every independent
rating group, the 64-bit version of Rybka 1.0 Beta played some 150+ Elo points
stronger than Fruit 2.1 (which only came in a 32-bit version).
We can see that Rajlich’s evaluation was materially different from Letouzey’s
in at least ten ways, but how did he develop these ideas? We find our answer
in new information about Rajlich’s early programming R&D work. During 2004
and 2005 Rajlich wrote himself a series of notes, some of substantial length,
on evaluation. He kindly emailed me some of these notes which make it amply
clear that he was intellectually engaged in evaluation and that copying was
the furthest thing from his mind.
These files were written in the same period that his accusers claim he spent
feverishly copying Fruit’s evaluation. It is exceedingly hard to see the point
of developing a slew of original ideas for Rybka only then to copy Fruit’s evaluation.
There is another difference between Rybka and Fruit that merits comment. A
common misperception following the ICGA’s report was that Rybka transcribed
Fruit’s evaluation practically verbatim into a different board representation,
and was principally different from Fruit in “the search” (i.e. algorithms
related to searching for the best move). As we have already seen, the idea that
Rybka’s evaluation is the same as Fruit’s is totally wrong both
in the specifics and the underlying premise.
To further clarify this point, Fruit used a “mail-box” representation
of the chessboard, while Rybka used a “bit-board” representation.
How a chessboard is represented in a program has nothing to do with evaluation;
it is purely a difference in program architecture. Rajlich dismisses the importance
of chessboard representation with this comment:
If you take Fruit’s evaluation and modify it from Fruit’s
board representation (called mail-box) to Rybka’s board representation
(called bit-board) no serious Elo difference is expected except possibly slightly
lower Elo on 32-bit processors and slightly higher Elo on 64-bit processors.
Given the points I’ve outlined above what are we to make of the following categorical
statements made by Zach Wegner in his ICGA report findings?
Simply put, Rybka's evaluation is virtually identical to Fruit's
Overall, the pawn evaluations of each program are essentially identical.
Because of Fruit's unique PST initialization code, the origin of Rybka's
PSTs in Fruit is clear.
These are all demonstrably incorrect and tendentious conclusions which would
be extremely misleading to someone who lacked the requisite technical expertise
or was not prepared to invest the necessary time to study the full contents
of his paper.
Feature Overlap: garbage in, garbage out
Mark Watkins, in his analysis of Rybka-Fruit similarities, compares several
chess engines with respect to their evaluation “features” and shows that Rybka
1.0 Beta has an “eval feature overlap” with Fruit 2.1 of about 74% (Rybka 2.3.2a
is judged to be about 64%).
Watkins shows “feature overlap”, not a “code overlap”. The precise definition
of an evaluation term in a chess engine (e.g. “rook on the seventh rank”) is
a mathematical formula which is calculated by an algorithm. The
algorithm itself is an abstract concept. It is implemented in a programming
language based on explicit data structures defined by the surrounding program
– that is called “code”. But Watkins’ evaluation feature is in actuality
the formula expressed by an algorithm. This formula is on the conceptual
level and therefore, according to accepted practice, everyone is free to use
it. Thus his entire analysis lacks traction.
But there are a few points that ought to be made about his analysis. First,
his choice of engines to compare against Rybka and Fruit are relatively weak,
and this fact puts the practical value of their evaluation feature set in comparison
to world-class engines under question.
Next, it should be mentioned that the assignment of “feature overlap” values
for each single evaluation term, using a scale of 0.0 to 1.0, was based on inherently
subjective judgments. Given Watkins’ analysis template, if we were to ask a
group of programming experts to assign overlap values to the engine pairs under
question, one cannot be sure how close Watkins’ values would be to the average
values they assigned, let alone how closely his values would correspond to practical
reality, which in any event is impossible to calculate with precision.
Finally, there is the matter of data interpretation. Even if we ignore the
points cited above, questions must be raised. Why is an overlap value in the
range of 40%-44% “allowed” but a value of 64% (Rybka 2.3.2a vs. Fruit) or 74%
(Rybka 1.0 Beta vs. Fruit) “not allowed”? Who sets these standards and what
are they based upon? Can the ICGA create and enforce new standards years after
a tournament is completed?
Dots amazing: the case of the errant ‘0.0’
A source of strife since the ICGA issued its report has been an analysis of
Rybka’s time management code. All chess engines have to ration how long they
can spend thinking about a move based in part on how much time they have left
on the chess clock. Time management, obviously, is as important in computer
chess as it is in human chess, particularly in situations when time remaining
is down to a few seconds.
Ironically, the basis of the ICGA’s argument boils down to an interpretation
of one line of source code in Rybka 1.0 Beta which they believe contains
‘0.0’. (No joke, ‘0.0’ appearing in a program written in 2005 has been a major
issue for the ICGA investigators.)
Fruit used a system of floating point numbers or “floats” (such as “0.0”) for
managing its time. Rybka 1.0 Beta had a faster and simpler approach using integers
(such as “0”) for checking time.
There is a time check within Rybka 1.0 Beta that the ICGA investigation team
says looks like this:
There is a time check in Fruit the looks like this:
So the ICGA investigators argued the following:
Rybka uses integer based time management so we would expect Rybka to look
If (movetime >= 0)
The fact that Rybka does not utilize an integer format, and instead uses
a floating point convention just like Fruit, is undeniable proof that code-copying
I asked Rajlich how the ‘0.0’ might have happened in Rybka and this was his
I don’t know where the 0.0 came from. It’s definitely weird/wrong. Rybka
was UCI from the beginning, even back when everybody was using WinBoard. I
would say that every two to three years I do a big cleanup of this code. This
might take a few hours, and then I won’t touch it until the next time. My
first UCI parser actually used inheritance, I was extending UCI to do some
testing, but that was gone even before Rybka 1.
This entire line of argument started years ago on Talkchess with a post by
Rick Fadden, wherein he pointed out the floating-point versus integer format
mismatch. This observation, which I have no reason to think was not made in
good faith, was probably the public origin of the Rajlich controversy. I say
this because this piece of seemingly concrete evidence placed into the psyches
of rival chess programmers that Rajlich must have copied code from Fruit,
and once that was accomplished all that was needed was someone like Fabien Letouzey
to return from computer chess retirement to light the fuse.
But here’s the thing: Fadden assumed that Rajlich really typed or copied ‘0.0’.
It is quite possible that his assumption was incorrect. Remember, Fadden didn’t
have Rajlich’s original source code either; his output only indicated that something
extraneous to integer format was on that line of code.
Rajlich could have typed this instead:
In other words, he could have just added a dot to the zero. If he did,
this would have compiled to exactly the same floating point compare instructions
as if ‘0.0’ had been coded.
The technical experts who helped me write this paper could scarcely believe
this point when it first dawned on them. They researched and double-checked,
and found that on Microsoft compilers contemporaneous with 2005 this observation
is indisputably correct.
The truth of the matter is that there is no definitive and provable answer.
I came to the conclusion that ‘0.0‘ is a litmus test. If you believe that Rajlich
is guilty of code-copying then ‘0.0’ reinforces that belief and is your smoking
gun. If you believe that Rajlich is innocent then you are apt to conclude that
typing ‘0.‘ (not ‘0.0’) was a simple coding oversight. Further mitigating circumstances
I can offer to those in the guilty camp are these:
Time management is not “game-playing code” (per ICGA Rule 2). It is interface
code from the engine to the outside world, i.e. Rajlich’s reference to a
Comparing the UCI parameters for the two engines reveals they are markedly
different just as we saw with the comparison of Rybka and Fruit evaluations.
Fruit 2.1 has twenty configurable UCI parameters (hash-size not shown in
the figure below). Rybka 1.0 Beta, in contrast, has only two such parameters.
But ultimately the “big picture” argument is the most compelling. This contentious
‘0.0’ issue comes down to a dispute about one extra keystroke, one single dot,
on one line of code that has zero impact on how the program actually plays.
On what reasonable basis can a person conclude from this one superfluous dot
that Rybka is non-original and Rajlich deserves to have all his titles stripped
and be banned for life? How could this literally nugatory piece of evidence
tip the scales in favor of the prosecution? How many devils can dance on a dot
– Part four (final) will follow soon –
Thanks to Ed Schröder for encouraging me to write this article as well as his
insights on the computer chess scene going back decades. A special thanks to
Nelson Hernandez, Nick Carlin, Chris Whittington, Sven Schüle and Alan Sassler
for their first class editing as well as their many valuable suggestions. Without
the lively collaboration of these individuals spanning several weeks this paper
could not have been written. Finally, let me thank Vasik Rajlich for his clarification
of various technical points and contemporaneous notes.
Thanks also to Dann Corbit, Miguel Ballicora, Rasmus Lerchedahl Petersen, Cock
de Gorter, Jiri Dufek for their excellent suggestions and eagle-eyed proof reading.
Søren Riis is a Computer Scientist at Queen Mary University of London.
He has a
PhD in Maths from University of Oxford. He used to play competitive chess (Elo
||A Gross Miscarriage of Justice in Computer Chess
(part one) |
02.01.2012 – "Biggest Sporting Scandal since
Ben Johnson" and "Czech Mate, Mr. Cheat" – these were headlines in newspapers
around the world six months ago. The International Computer Games Association
had disqualified star programmer Vasik Rajlich for plagiarism, retroactively
stripped him of all titles, and banned him for life. Søren Riis, a computer
scientist from London, has investigated
||A Gross Miscarriage of Justice in Computer Chess
03.01.2012 – In this part Dr Søren Riis of
Queen Mary University in London shows how most programs (legally) profited
from Fruit, and subsequently much more so from the (illegally) reverse
engineered Rybka. Yet it is Vasik Rajlich who was investigated, found
guilty of plagiarism, banned for life, stripped of his titles, and vilified
in the international press – for a five-year-old alleged tournament rule