Winning starts with what you know
The new version 18 offers completely new possibilities for chess training and analysis: playing style analysis, search for strategic themes, access to 6 billion Lichess games, player preparation by matching Lichess games, download Chess.com games with built-in API, built-in cloud engine and much more.
One of the central arguments in the ICGA report has been that Rybka and Fruit had very similar “Piece-Square Tables” (PSTs). PSTs are numerical values within eight-by-eight arrays representing a chessboard, with each array corresponding to a different chess-piece, usually expressed as integers and arranged into simple symmetrical patterns. The purpose of PSTs is to modify a position’s evaluation based on the square location of various chess pieces.
Rajlich comments on the practical importance of PSTs:
The effect of PSTs is minimal but probably positive. Any reasonable choice of PST values leads to +/- less than 1 Elo.
Rajlich’s comment is notable for both its brevity and its significance. The first realization one makes is that the whole PST case in actual playing terms is a trivial sideshow whatever the merits of the ICGA’s assertions. PSTs, per Rajlich, have no material impact on playing strength and possibly no measurable impact within standard statistical confidence levels. Rajlich is evidently not even sure himself based on his remark that PSTs are “probably” a positive factor.
To be efficient the PSTs should use at most two or three CPU cycles. This explains why Fruit and Rybka evaluations were based on PSTs with simple integer patterns (e.g. +1, +0, -1, -3); integers are quicker than floating-point numbers. None of the PSTs had been optimized in Rybka 1.0 Beta, hence their simplicity. Both Fruit and Rybka 1.0 Beta use PSTs that were generated through the use of simple formulas that reflected then-common chess knowledge. Even though Rajlich recognized that PSTs had only a tiny impact he still wanted to have the ability to fine-tune his evaluation. There was no demonstrated intent to obfuscate Fruit integers in Rybka’s code as is stated in the ICGA report.
In actual fact, every one of Rybka's PST values is different from those in Fruit. Dr. Miguel Ballicora persuasively shows here and also here that Rybka and Fruit PST tables are totally different. In addition, Ed Schröder has summarized Ballicora’s and Chris Whittington’s analysis on his website here. I will not recapitulate the arguments of these gentlemen here, as they are quite technical, but let me suggest that they demolish the PST case.
Based on the findings at the three linked sites above we could stop discussing PSTs right here and move on to the next topic, but Dr. Hyatt and his supporters have invested a great deal of effort defending PST as definitive evidence of “code-copying”. In the process, they dug an ever-deeper hole for themselves and, I regret to say, have not stopped their excavation work.
We must press on with the topic of PSTs because it puts the whole problem with the ICGA report in the spotlight. When I, like many people, first read the ICGA report I found their tables of virtually identical, side-by-side Fruit and Rybka code highly incriminating. This was surely the same reaction ICGA head Dr. David Levy had, as well as panel members not serving on the report-writing “secretariat”. Given the evidence that was presented, and assuming its veracity, the final outcome of the ICGA investigation was an absolute and undisputable certainty. On this point we need to be very clear.
It really goes without saying that the panel members voted based on the findings of the ICGA report, and it would have been extremely prejudicial to the whole process if the report presented data in a misleading way.
Let’s start unraveling the PST mystery by reading what Rajlich said to Dr. Levy in a terse email when the ICGA investigation was in progress:
I'm not really sure what to say. The Rybka source code is original. I used lots of ideas from Fruit, as I have mentioned many times. Both Fruit and Rybka also use all sorts of common computer chess ideas.
Aside from that, this document is horribly bogus. All that "Rybka code" isn't Rybka code, it's just someone's imagination.
Rajlich issues a flat denial. What is characterized as “Rybka code” in the report, he says, is “horribly bogus”. You can only come to one conclusion: either Rajlich is flat-out lying or the ICGA report is wrong, or possibly even fraudulent. There is no middle ground.
The ICGA report contends that Rybka used Fruit’s PSTs in its evaluation function. To support this the report provides page after page of near-identical source code side by side. However, it turns out on closer inspection that the most damning portion of the ICGA report is in fact a work of fiction, and as a consequence the parts of the ICGA report related to code-copying are profoundly misleading. Ed Schröder’s reports [1] [2] make the case powerfully.
One of the first insights that led to the revelation that the PST “Rybka code” might come from another source came from one of the chief ICGA accusers, chess programmer Mark Lefler. Lefler nonchalantly pointed out in an online post that "[Rajlich] could have used a spreadsheet" for PST generation, an admission that undermined the sourcing and importance of PSTs.
But this crack in the ICGA argument was a trifle compared to the subsequent train of events. Critics of the ICGA soon realized that no one has actual Rybka source code from before 2010, not even Rajlich himself, who sheepishly admitted to Nelson Hernandez off-camera in the course of their July 2011 video interview that he had never maintained any form of version control for Rybka source code until Rybka 4.
This jaw-dropping admission is entirely believable because Rajlich asked for copies of his own program long before the ICGA controversy started. In mid-2010 he makes this rather embarrassing request in the Rybka forum:
Can someone please post here all of the Rybka 2.3.2a versions? (I don't seem to have a copy any more.)
This realization led to the next insight. One of the premises of the ICGA's report is that original Rybka source code can be reconstructed from the reverse-engineered binary of Rybka 1.0 Beta. This is simply incorrect. The most that can be gathered from this approach is the assembly code that was output from the optimizing compiler that Rajlich used when he compiled the Rybka source code.
To restate the problem, no one has the original source code to Rybka, and Rajlich claims he only took ideas and not game-playing code from Fruit. Hyatt, et al, adamantly believe that Rajlich copied Fruit source code but cannot prove it as they do not have the actual Rybka source code. It is absolutely not possible to reconstruct the original source code of Rybka from a reverse-engineered executable of Rybka because there is a one-to-many mapping during this process. There are, in fact, many ways that Rajlich could have written Rybka and it is impossible to say exactly which path he took. This was confirmed by Dr. Hyatt himself:
We are not trying to take a binary executable and turn it into C. That is a one to MANY (MANY = INFINITE) mapping.
The ICGA report shows us five pages of near-identical code, showing actual Fruit code on the left column and "Rybka" code on the right. However, as already stated, the "Rybka" code cannot be the original Rybka source code.
Here’s an actual example of falsification in the ICGA's report:
As already mentioned, the first thing to note is that none of the code in the Rybka column is actually in Rybka. But please note the zero weight on the third line of the manufactured Rybka code. Are we supposed to believe that the weight is multiplied by zero in line three of the static declarations? If this were actually so it would not appear in the executables because any compiler would optimize and remove the unnecessary step. The only conclusion someone familiar with programming can make is that this code is fictitious. Since no one has a copy of the Rybka 1.0 Beta source code, no one can prove otherwise.
Had the ICGA titled the right-hand column of its PST analysis "functionally-equivalent code possibly used by Rybka” that would still be misleading as all that would be compared would be schematic PSTs with low information content. It would be misleading but at least there would be truth in labeling.
However, writing "Rybka" as the column title is completely misleading, which brings me to a crucial train of logic which seems inescapable.
It is very reasonable to conclude that the ICGA members who drafted the report knew exactly the desired effect that labeling pages of speculative material "Rybka" would have. The writers of the ICGA report are not men who act capriciously; they have long experience in academia and are intimately familiar with the standards of scientific evidence.
They could not have failed to intuit that most people lack the technical expertise and the time to comprehensively audit and assess technical documents. They must have known that the public at large would trust the pedigree, reputation and integrity of the report-writers as well as the ICGA as an institution. They surely knew that very few people would have the resources or incentive to challenge a report with the ICGA's imprimatur. Being familiar with human nature, they must have realized that those with any doubts would likely conclude that this was Rajlich’s fight, not theirs, and that in any event Rajlich could provide the answers – or not.
And finally, on some level Dr. Hyatt in particular must have known that for Rajlich to fight the charges would degenerate into an unseemly quarrel where the mild-mannered Rajlich would be assailed by an unending hail of accusations, insults and sophistries.
I can say this because I and others have publicly defended Rajlich, and that is exactly what has happened over the course of thousands of Dr. Hyatt’s posts. In my capacity as Rybka forum moderator I have access to posting statistics. The chart below speaks for itself. Four months of relentless attacks on Rajlich’s own website!
These observations are not personal; they are simply factual evidence of the singular intensity and apparent motivation of Rajlich’s chief accuser. Imagine how long it would take you to write forty lucid forum posts in one day. Dr. Hyatt achieved this stupendous level of vitriol no fewer than 26 times in a four month span, peaking at 71 posts. Yet, Dr. Hyatt believes this is perfectly normal behavior for an associate professor of computer science and is not a relevant datum. I mention it because I think the reading public may have justifiable concern about Dr. Hyatt’s excessive devotion to the Rajlich-is-Guilty crusade.
Returning to the truth-in-labeling issue, the ICGA‘s blanket excuse in their report is that they did note that the comparisons were not actual source code:
The code shown here is simply the functional equivalent; it calculates the Rybka PSTs.
There are two problems with this. There are many different ways to write functionally equivalent code that looks nothing like the Fruit code. Dr. Miguel Ballicora posted source code to the Rybka forum that generates Rybka and Fruit PSTs but looks completely different from Fruit code.
But more importantly, for a protracted period of time following the release of the report, Dr. Hyatt repeatedly stated that Rybka made a direct copy of Fruit, and referred readers to the report to prove his case, citing the side-by-side comparisons shown there as functionally equivalent to DNA evidence. He did not even concede the “functional equivalency” cited in the report until this point was brought to his attention. That is really problematic because it calls into question how the evidence was “sold” to the rest of the ICGA panel.
Over the course of the forum debates Dr. Hyatt made a series of three remarkable statements which tell us what actually happened.
The evidence is _not_ based on "conjecture". It is based on specific analysis of Rybka and Crafty or Rybka and Fruit. There is no "interpretation" required. Have you actually _read_ Zach's and Mark's report? People keep saying "show me side by side comparisons." First page of Zach's report has _exactly_ that. Two columns. The comparison goes on for pages and pages. Side by side. Piece by piece...
you realize that the code on the right is imaginary? It is the code on the left, with the weights modified, so that you get the same PST values that Rybka _actually_ uses.
The easiest way to show a layperson that the Fruit source matches the Rybka binary is to make our "pseudo-Rybka source" match Fruit as closely as possible.
This may be a good moment to take two aspirin pills. Let’s summarize these statements:
It is clear from the time-lines that position #1 above (the side by side, piece by piece, five pages of Fruitified Rybka code) must have been the unchallenged position presented by Hyatt and Lefler to the ICGA investigating panel of 34. They must have drawn and disseminated a damning but entirely false conclusion from their own report, for how else could Hyatt and Lefler still be erroneously and misleadingly claiming there is Fruit and real Rybka code side by side in the report several weeks after publication?
Caught in this web of his own making, at one point Dr. Hyatt even admitted that the PSTs in the Rybka column were not copied code, boldly asserting:
There was NO CODE COPYING for the PST issue. NONE. NADA. ZILCH. ZIPPO.
An emphatic statement! But two days later, apparently realizing that such a statement would mean that Rajlich was innocent, Hyatt changed his mind and wrote that Rajlich had copied PST code after all.
According to Rajlich, he wrote a utility program (separate from Rybka and not available to users) in the C# language to generate his PSTs. As Fruit is written in C (not C#) this means there is a 100% certainty that Rajlich did not copy the Fruit PST generation code. Even if Rajlich had used similar formulas to those used in Fruit this would constitute idea re-use and not code copying. It is also quite possible that Rajlich used completely different formulas to the ones used in Fruit as demonstrated by Dr. Miguel Ballicora.
Another point is that Rajlich used his own header files (a variation on C++ templates) in the evaluation function of Rybka. These generated characteristic code repetitions in the compiler output. In the early stages of their investigation, programmer Zach Wegner mused publicly about strange repetitions in the Rybka code, but nobody understood what they signified.
It turns out Rajlich wrote large parts of Rybka using custom code (in C #defines) which allowed him to, for example, create a representation for a “good white knight” and then use exactly the same #define code to create a representations for a “good black knight” – a chess-programming cookie-cutter if you like. In so doing Rajlich effectively extended the C programming language with helpful new chess-related constructs. Rajlich commented to me on his template approach:
I took my "template" approach further and further over time. In Rybka 1 I was using this for things like evaluation and attack generation. Later I used it for move generation as well. Now I use it for all kinds of crazy things.
This language extension he talks about is conceptually similar to a function key on a keyboard or a mathematical function. He is taking something complex, that may involve many steps or words, and reducing them to something much simpler. In doing this he in effect creates his own language, a lot like someone who coins a precise new word to express an idea that formerly required twenty imprecise and muddled words to describe.
The point is that Rajlich’s implementation of #defines was conceptually and functionally different from Fruit’s. While this development ethos does not necessarily constitute a defense against code-copying (his #defines could have been copy and pasted from a different source) this does represent a clear conceptual and architectural difference between the ways that Rybka and Fruit were developed.
The ICGA report cites the extraordinary increase in Rybka’s strength following Fruit’s release as circumstantial evidence of plagiarism, at first glance a provocative line of reasoning. However, as has already been argued, this is a wrong diagnostic. All modern chess engines come into existence as fast-climbers and this strength-increase pattern cannot be credibly used as prima facie evidence of plagiarism.
The ICGA report presented its case as if Rajlich was only given a small time window to learn from Fruit (six months from the mid-2005 Fruit-release to the December 2005 release of Rybka 1.0 Beta), implying that he must have copied from Fruit wholesale in order to achieve the Elo strength he attained by the end of 2005, as otherwise he could never have achieved so much in six months. The report fails to mention that Rajlich became a full-time chess programmer in 2003, and earlier versions of Fruit were released starting in mid-2004. Rajlich had more than a year to study the programming style, ideas and algorithms in Fruit. It is perfectly reasonable to think that he integrated what he learned into his own program. The evidence does not justify an inference that he must have copied code.
Apart from the substantive claims made by the ICGA a dispassionate observer ought to reflect on whether the structure and process of the investigation as well as its conclusions were reasonable and proportional to the alleged rule violation.
The ICGA decided to mount an investigation of Rybka after sixteen programmers submitted an open letter wherein they claimed Rybka contained illegal Fruit code. As I have already cited, Rajlich had himself already stated that he “went through the Fruit 2.1 source code forwards and backwards and took many things”. Rajlich’s statement was widely known and had been discussed ad nauseum by programmers and computer enthusiasts on Internet fora for a number of years. Yet, the ICGA final report pointedly omitted any mention of Rajlich’s past published statements. Thus it is not incorrect to say that the ICGA was in effect investigating what Rajlich had already told the general public five years earlier, before Rybka participated in any WCCC tournaments.
A panel was formed. Dr. Hyatt served as panel gatekeeper and determined who was and was not allowed to participate. Rybka competitors, individuals with obvious conflicts of interest, and individuals who had publicly expressed their predetermined conclusion of guilt were allowed to join the investigation. The fact that such members could not only prejudice the investigation but also vote was not considered inappropriate. The ICGA defends this state of affairs by saying, in effect, ‘who else but the interested parties would serve on such a panel?’ This attitude, I think, is a classic example of losing the plot.
While this jury-stacking was going on the president of ICGA, Dr. David Levy, made a preemptive declaration of Rajlich’s guilt in a ChessVibes article before his own panel had had sufficient time to investigate and fully deliberate the facts.
Not even half of the original committee of 34 voted for a guilty verdict. Was it even clear in advance how many guilty votes were needed to convict?
Members on the panel were only asked to decide the issue of guilt or innocence. They had no influence on the kind of penalty that would be handed down were they to find Rajlich guilty. The matter of sentencing was in the hands of the ICGA's board, headed by Dr. Levy. Levy, given his position in the ICGA, his public statement of Rajlich's guilt, and the superficially persuasive nature of the ICGA report, could hardly have been contradicted by his own board. In the end, Levy duly exercised his punitive powers based on the consensus that had been reached.
While no one questions the fact that the ICGA gave Rajlich ample opportunity to respond to their charges and he did not, there is much more to the matter than “we queried him and he did not respond.” Rajlich was not merely queried. He was publicly accused by the head of the ICGA and publicly excoriated by a group of individuals who stirred themselves up into a crusading lynch mob. A pile of “evidence” was jubilantly thrown together based on a passionately-held predetermined conclusion of code-copying which happened to be wholly at variance with actual reality. And then Rajlich was offered the opportunity to formally respond.
The whole process was an unprofessional disgrace.
There are those who will say, “if Rajlich had only cooperated with the ICGA investigation it would never have come to this.” My response is that if you have confidence in the integrity and objectivity of the investigators this would be a compelling point. But in the absence of this confidence a perfectly reasonable attitude is “why should a four-time world champion and the world’s leading chess programmer dignify the ICGA’s allegations with a reply if he knows them with 100% certainty to be not only false, but ridiculous?” Let’s put ourselves in Rajlich’s shoes. Most of us, I would guess, would become belligerent and combative, and attempt to cleanse our besmirched reputation: we would strike back at our foes. That is the normal, red-blooded response of a common man. I submit that Rajlich is an uncommon man.
As for the nature of the punishment meted out by the ICGA, we might observe that justice can be defined as every man getting his due and letting the punishment fit the crime. There is no evidence that justice was done in this case in either sense, which is why I wrote this article: to publicly address an injustice and, perhaps, remedy it.
We all know that in competitive sports the players often push rules to their limits. We all know the difference between hard but clean play, yellow card offenses and red card offenses. We all know that cheating that merits a red card is deliberate, not trifling and often premeditated. Unintentional rule infractions, or even attempts to push a vague rule to its limits, do not warrant a red card, let alone something even beyond a red card: a lifetime ban. And finally, we know that rule violations, if they occur, do not merit the equivalent of capital punishment rulings five years after the fact!
We finally come to a realistic appraisal of the situation in computer chess just prior to the emergence of the Rybka allegations. The demonstrated lack of proportionality in Rybka’s banishment returns us again to the matter of Rybka’s near-monopoly over computer chess competitions and chess engine commerce for a number of years.
Not only did Rybka have a massive lead in tournament play, but it had access to massive hardware and its latest Rybka Cluster developments were locked up, beyond the reach of reverse-engineers. Rybka’s opening book was (and is) among the world’s best. Leading the team was Rajlich himself, a hypercompetitive genius with an insatiable desire to win and win again, and a business model that methodically froze out everyone else. He had no friends in his peer group to watch his back because he had no peers. Moreover, he was not good at concealing that he had no use for them. It is easy to see how some could perceive this as arrogance because maybe that’s exactly what it was.
It is reasonable to conclude that this dominance was so pronounced and seemed so insurmountable to Rajlich’s rivals that they seized the only available opportunity to banish Rajlich and Rybka forever, not merely from ICGA-sponsored tournaments, but all tournaments anywhere in the world. If it cannot rightly be said that they actively “seized” the opportunity, then it can more accurately be said that they passively did not regret seeing Rajlich excluded and did nothing to prevent the travesty that took place – and they voted against him.
It is also reasonable to conclude that other programmers found it unacceptable to attend week-long WCCC tournaments in far-off places like Beijing, China and Kanazawa, Japan out of their own funds, paying entry fees, air fare, hotel, food and incidentals, only to be repeatedly blown off the board by a program whose dominance seemed to increase year after year with no end in sight. The economics of this no-win proposition understandably did not work for them. This, in turn, undoubtedly threatened the near-term viability of the ICGA’s annual tournament. Rybka was just in a class by itself, everyone knew it, and this apparently intractable fact simply became intolerable.
The ICGA had, in fact, already tried to constrain Rybka’s superiority by limiting the amount of hardware a contestant could use in the 2009 WCCC in Pamplona, Spain. This limitation would have excluded the Rybka Cluster from competition and “leveled the playing field”. Ultimately it made no difference: Rybka won the “limited hardware” tournament rather easily. Stronger measures were needed to knock Rajlich off his perch.
Finally, it is reasonable to conclude that Rajlich’s long reign at the top of the rating lists, his monolithic dominance in public tournaments, sequence of menacing strategic actions such as his development of the Rybka Cluster, his publicly-stated intention to sequester his best development work so that it could not be reverse-engineered, his business alliances with Convekta and ChessBase and publicity juggernaut – all of these things and more marked Rajlich as a convention-flaunting rogue programmer and hence, in the eyes of some, a public enemy. Jonathan Swift put it cruelly:
When a true genius appears, you can know him by this sign: that all the dunces are in a confederacy against him.
I concede that all the above may not be exactly what happened. It is only my opinion. But to this observer it is a believable narrative because it is informed by knowledge of human nature and human history. Whenever institutions or persons come along who assume a position of overwhelming power alliances of the downtrodden tend to form and start plotting the tyrant’s overthrow. We see several examples of this in recent world history. Computer chess may not be a field of great sociopolitical significance, but those who dwell in it have the same hard-wired human impulses. It doesn’t matter that the perceived-tyrant is an innovative genius. Caesar was assassinated, after all.
In something of a surprise epilogue that took place as this article was in its final stages of being written, it emerged that the times they are a-changing for computer chess generally and the ICGA in particular. At the recent Rybka-less 2011 WCCC held in the Netherlands none of the top seven ranked programs on the then-existing CCRL 40/40 list attended, nor did the two winner-by-default co-champions from 2010’s WCCC (Rondo and Thinker), omissions which stimulate furtive doubts about the credibility of the “World Championship” title the contestants struggled so mightily to win. During this competition the programmers met and some expressed a desire to change WCCC Rule 2. This was posted to the Talkchess forum by an attendee:
This position [100% originality] is not an obvious majority opinion anymore from the tri-ennial ICGA meeting this week where this was a lengthy agenda point. A fair group of participating programmers present have expressed they want the rules to be updated. One line of thinking is that attribution plus added value should be sufficient to compete, instead of 100% originality.
The ICGA's policy appears to be that the programmers decide on the rules, because if there are no programmers, there can be no tournament. Without question, updating WCCC Rule 2 to reflect contemporary reality would be a years-overdue positive step. However, without justice for Rajlich as a first step any proposed rule amendment would mean Rajlich would continue in his ICGA-imposed pariah status while other programmers would be free to use Rajlich's ideas, algorithms and reverse-engineered source code (from existing and future editions of Rybka) with little fear of reprisal. First things must come first: the ICGA must retract a grave injustice inflicted upon a great chess programmer, world champion and innocent man.
Full paper "Miscarriage of Justice" in PDF
Summary of paper – short summary (both in PDF)
Thanks to Ed Schröder for encouraging me to write this article as well as his insights on the computer chess scene going back decades. A special thanks to Nelson Hernandez, Nick Carlin, Chris Whittington, Sven Schüle and Alan Sassler for their first class editing as well as their many valuable suggestions. Without the lively collaboration of these individuals spanning several weeks this paper could not have been written. Finally, let me thank Vasik Rajlich for his clarification of various technical points and contemporaneous notes.
Thanks also to Dann Corbit, Miguel Ballicora, Rasmus Lerchedahl Petersen, Cock de Gorter, Jiri Dufek for their excellent suggestions and eagle-eyed proof reading.
Søren Riis is a Computer Scientist at Queen Mary University of London. He has a
PhD in Maths from University of Oxford. He used to play competitive chess (Elo 2300).
A Gross Miscarriage of Justice in Computer Chess (part one) 02.01.2012 – "Biggest Sporting Scandal since Ben Johnson" and "Czech Mate, Mr. Cheat" – these were headlines in newspapers around the world six months ago. The International Computer Games Association had disqualified star programmer Vasik Rajlich for plagiarism, retroactively stripped him of all titles, and banned him for life. Søren Riis, a computer scientist from London, has investigated the scandal. |
A Gross Miscarriage of Justice in Computer Chess (part two) 03.01.2012 – In this part Dr Søren Riis of Queen Mary University in London shows how most programs (legally) profited from Fruit, and subsequently much more so from the (illegally) reverse engineered Rybka. Yet it is Vasik Rajlich who was investigated, found guilty of plagiarism, banned for life, stripped of his titles, and vilified in the international press – for a five-year-old alleged tournament rule violation. Ironic. |
A Gross Miscarriage of Justice in Computer Chess (part three) 04.01.2012 – A core accusation against Vas Rajlich is that Rybka and Fruit have very similar positional evaluations, and the use of floating point numbers in Rybka’s time management code had to be copied from Fruit. Søren Riis enumerates the ten substantive evaluation differences and shows how the second accusation boils down to a single misplaced keystroke with zero impact on Rybka's play. |