The Milov vs. Rybka Handicap Match

9/24/2008 – The chess program Rybka has played a number of handicap matches against titled players, but never before one against a 2700+ player. Last week it got an opportunity against Vadim Milov, playing two regular games, two with pawn and move handicap and four with exchange odds. It was a well-matched battle, instructive for both the programmers and the Super-GM. Larry Kaufman reports.
The Milov vs. Rybka Handicap Match

By IM Larry Kaufman

Although Rybka has played no less than thirteen official matches at various handicaps with titled players (ten of the matches with GMs) in the past year and a half, the match from Septermber 14-18 with Vadim Milov was our first match with an Elite (FIDE 2700+) GM, Milov being rated number 28 in the world at 2705. As usual, the match was played at my home in Potomac, Maryland, USA and broadcast on the Internet.

Rybka opponent Vadim Milov, rated 2705

The time limit was the FIDE standard 90 min + 30 sec. Rybka played on a 3 GHz Octal computer, using only 3-5 man endgame tablebases and a small handicap opening book I wrote. The version was a slightly modified Rybka 3 default, differing mainly in that the value of the exchange was lowered, which probably only affected play in the exchange handicap games. A contempt setting of 50 was used in all games. The eight game match consisted of three parts:

  1. Two normal chess games where the only "handicap" was Milov's getting white in both games. He lost the first of these and drew the second by reaching a pawn down endgame that was easy to hold.

  2. Two games at the classical "pawn and move" handicap (f7 removed). Here the first game ended in an early draw by perpetual check when Milov returned his pawn to drive Black's king out into the open, but he could not find any way to achieve more than a slightly preferable position. The second game was a disaster for Rybka; the position became very closed early on, but Milov was able to calmly build up an attack on the kingside (where he had an extra pawn), while Rybka just made delaying moves until it was too late to get counterplay.

  3. Four games at odds of the exchange (Rybka removed a1 rook, Milov b8 knight). Here Rybka drew three times and lost once. In one of the draws Rybka had good winning chances, and in the one she lost she had good drawing chances.

Click to replay and download the games

So Milov won the match by 4.5-3.5, the first match victory by a human over Rybka, except for a knight odds match with an FM. With a little better luck or perhaps a little better play by Rybka the match could easily have ended in a tie score. Although Milov made some mistakes (who doesn't?), I don't think he made any errors that I would call "blunders". In general he played very well, used his time properly, and tried to avoid complicated tactics as much as possible, which is of course the proper strategy against a computer. His preparation for the match failed him once, in the drawn game at pawn and move, when Rybka's opening move (1...Nc6) already surprised him.

Before the match, my expectation was that we would win the white handicap-only games by 1.5-0.5 or 2-0 but lose the pawn and move games by a similar score, leaving the match to be decided by the exchange handicap games. This is indeed what happened. It was my opinion (as well as the opinion of Milov, Rybka, and Howard Staunton in the days when these handicaps were common) that pawn and move handicap is larger than the exchange handicap, although the difference now seems quite small. The exchange is nominally more than a pawn (two pawns according to textbook count), but it is worth somewhat less in the opening, while the f7 handicap is worse than just a pawn due to playing black and the exposed king. Both handicaps seem to be worth about 1.5 pawns overall, maybe a bit less in the case of Exchange handicap.

Here are some conclusions I've drawn from the match:

1. We need to do more to avoid blocked positions. Rybka 3 is much better than earlier versions in handling this, especially with a high contempt setting. But when down in material (as in handicap games), Rybka often tries to close the position to make a draw more likely. We need to remedy this before playing more matches (especially handicap matches) with humans.

2. We need a special opening book to play for a win with black in normal chess. When White plays very conservatively, it's not easy to win as Black with normal defenses. The drawn game we had in the non-handicap game was an Exchange Slav.

3. I should have set contempt higher in the handicap games. I don't know if it would have made a difference, but it might have avoided some blocking moves.

4. Although pawn (f7) and move handicap is quite playable, there are just a few playable defenses to 1.e4, which means a lack of variety. If a GM is well-prepared in the opening, it would be quite difficult for Rybka to do well at this handicap in a serious match like this one. I think we were too generous to offer this handicap to an Elite GM at a standard time limit. More appropriate would be either the f2 or c7 pawn I think.

5. The Exchange handicap proved to be both interesting and competitive. Although Black is surely winning from the start, at least Rybka can use her first move and extra minor piece to play actively in the opening. However there is one big problem that I underestimated before the match. White cannot castle long without a rook on that wing, but Black can, and did in two of the four games. With his knight missing from b8, castling long is easy for Black to do. In rook-odds games long ago, some players allowed White to play Kc1 in one move, but this would be impossible for a normal Rybka to consider. The handicap is already decisive materially, and for Black to have a choice of where to castle while White does not makes it too much. So if the exchange handicap is used in future matches, I think that Black should not be allowed to castle long, as if the a8 rook had already moved.

6. Milov felt that now that he has played Rybka and learned her weaknesses he would like to try a rematch with only the white pieces advantage. He felt that with a copy of the program to be used, access to comparable hardware, and enough prize money to justify spending a month or so in preparation he could play a competitive match. I don't believe that the white advantage is enough for any human to score more than 30% or so, even with massive preparation, especially if we add more incentive to avoid closed games. But I'd welcome the chance to try to prove this, so anyone interested in sponsoring such a match please let me know.

7. One idea for future matches could be called "dynamic handicaps". The first game would be played with the GM just playing white. If he wins he must play black next, if he draws the next game is played the same way (GM is white again), but if he loses the next game is played with some small handicap (maybe two moves, maybe severe book restriction). Each win by the engine increases the handicap (from a set list of handicaps) and each win by the GM decreases it. This would insure an interesting and varied match regardless of the level of the GM or the time limit.

8. What FIDE rating would Rybka 3 on an Octal computer like mine get if allowed to play with the top five players in many tournaments? Although the computer vs. computer rating lists suggest a rating of around 3200, there is some evidence that engine vs. engine play overstates rating differences between computers by 25% or so. This would imply a rating of around 3100. We can try to estimate from these matches by assigning rating values to the handicaps based on results in engine vs. engine play, for example by using the Monte Carlo feature on Rybka 3. I don't have much data at serious time limits, but extrapolating from fast results I would say that the exchange handicap is worth perhaps 450 and the f7 handicap 500 when the recipient is 2700. The White only handicap is around 50. So the average handicap in the eight games works out to 363 Elo, which when added to 2705 (Milov's rating) gives 3068. The performance rating for the match is about 3025. However our performance rating calculated this way in the other three matches played with Rybka 3 or a version close to it was somewhat better than this, so I think it is fairly safe to say that against top human players Rybka should perform somewhere between 3000 and 3100 FIDE.

