Feedback on intelligent mistakes

9/23/2005 – The ChessBase Workshop column on "intelligent mistakes" and chess computer handicap modes has generated a great deal of response (and suggestions) from our readers. We take a look at a sampling of these responses in the latest ChessBase Workshop.

ChessBase 14 Download ChessBase 14 Download

Everyone uses ChessBase, from the World Champion to the amateur next door. Start your personal success story with ChessBase 14 and enjoy your chess even more!


Along with the ChessBase 14 program you can access the Live Database of 8 million games, and receive three months of free ChesssBase Account Premium membership and all of our online apps! Have a look today!

More...

OPENING LINES

by Steve Lopez

After having a great many e-mails forwarded to me by the folks at ChessBase (as well as forwarded requests for my e-mail address), I finally broke down and set up an e-mail account for ChessBase Workshop responses. E-mails sent to the new address will come directly to me.

Before I present the address, some quick notes:

  1. The address is strictly for comments, suggestions, etc. on my ChessBase Workshop columns; I will not be answering technical support questions submitted via this e-mail address. Sorry, but I only work part-time in the chess software field and just don't have the time to answer software support questions. For tech support questions, please use the link provided for them on the ChessBase website.

  2. I promise to read every e-mail sent to the new address, but I won't guarantee a personal response to every e-mail.

  3. By sending an e-mail to the ChessBase Workshop address, you're also giving permission for the e-mail to be published in a future ChessBase Workshop column.

Now for the goods. To reach me directly, please send your e-mail to chessbaseworkshop@yahoo.com. I'll also provide this address at the end of each column from now on.


FEEDBACK ON "INTELLIGENT MISTAKES"

by Steve Lopez

When I wrote the ChessBase Workshop column on software users' desire for "intelligent" mistakes when the engine is set to a handicap level, I knew it would draw some reader responses. Most of them have been really good; a few have been dang near indecipherable. We've taken your suggestions and passed them along to the programmers for experimentation and possible implementation.

In this column, we're going to look at a representative sampling of the responses. I'll drop in a few responses (in italics) along the way. Please note that my responses aren't intended as "arguments" -- they're just more food for thought. "Dialogue" is cool, "bitter contention" isn't.

------

First off, I knew somebody was going to get about half hacked-off at the column (any e-mail containing the words "take issue" in the first line spells some level of trouble). So that's a good place to start:

I take issue with the way you have presented the argument on "intelligent" mistakes in your article. I have often thought that the answer was easy to implement for code writers. A chess playing code merely goes up and searchs for the highest evaluated position in its ply search of possible positions. Since programs evaluate hundreds of thousands or millions of positions per move, why not use "secondary" evaluated positions against less-than-GM strength players? This way, the computer could be seen as being more "human".

Personally, my main complaint with playing computers is the "changing" of the computer's strength. It is "inhuman" to play a move that any C level player can see will hang a pawn and then play startling correct chess for twenty more moves ending in a wild 15 move combination. This defeats the point of a "sub-GM" computer in my opinion. The computer should react the same way as a sub-GM level player does. It should miss the occasional 3 move combination (computers never do this), it really shouldn't find the 7 to 10 move combinations that leave it a pawn up every game (it happens to me virtually every game).

Really, analysis of multiple games at a given level should come up with some startling statistics. Such as, "percent chance of missed 4 move combinations leading to +1 score (pawn gain) each game". Give the computer program at each level roughly the same "difficulties" and instead play moves which it evaluates as essentially "zero" (i.e. no positive or negative gain). Thus, a computer wouldn't act so "computer-like" and players would feel it is acting on their own level.

Also, the computer should never "ramp up" in strength dramatically after getting into a poor position. It should maintain the illusion that it IS a poor player. Thus, if the computer is set to "Beginner", it should act like a beginner throughout the game (low chance of seeing any combination beyond 3 moves or even 2 moves, high probability of dropping pawns, increased chance of making silly pawn pushes, disregard for king's safety). If the computer is set to "Expert", it should act like an expert for the entire game (high likelyhood of seeing 3 to 5 move combinations, low probability of seeing things beyond this, so on and so forth).

Really, it is up to the programmer to limit the computer's options, not to simply throw up their hands and say, "I will just give you draw odds behind your back and you will LIKE IT!"

-- Li Staffon
Stanford, USA

Thanks for writing. I think your second paragraph essentially reiterates come of the points I raised in my column. My guess (since I'm not a chess programmer) for why secondary (or lower) evaluations aren't used is the complexity of writing code for it. For example, let's postulate an example in which the top three moves have essentially equal evaluations. That seems simple enough -- you just have the program play the highest evaluated move which is lower by, say, 0.50 pawns or more than the top move.

But let's say that it's an endgame (yes, people do still have their software play endgames without benefit of tablebases) and all of the possible moves are evaluated within that 0.50 pawn span. Then what? Or maybe it's a middlegame position in which only one move scores positive and anything else hangs a piece or pawn; if you let the computer play anything but the top move you go right back to the "unintelligent mistake" argument.

I don't see the actual coding as complex; it's more a question of the circumstances under which such code would be triggered. What point would be most propitious for a computer to play a sub-optimal move? Identifying such a point (and then keying it to a particular level of play as you suggested ) would be the trick.

As for your last paragraph, I'm not sure that it's a case of programmers "forcing" the idea on users as you state. I'm not sure the programmers even like the way they're doing it; I think it's just a matter of it being the best way they've found so far. -- SL

-----

Thanks for an interesting article. You are right that because the program is a simulation and that computers don't approach the game the same way humans do, it isn't possible to program them to make human like mistakes without a lot of difficulty to the programmer. May I suggest that this may not be necessary? Take advantage of the computers greatest strength - its consistency. As you noted back in late 1980s it was already possible to buy dedicated computer chess devices which could defeat 95% of most players. Why not make changes like this:

1. At the lower settings - just reinstitute the horizon effect - set it so the computer won't look beyond a certain number of ply - realistically most players don't see 3+ move combos regularly, so the machine will still play a reasonable game. As the game progresses and the pieces drop the program can raise it 1 ply or 2 ply to help with the endgame.

2. Alter the point where the computer selects the best move - allow the computer to randomly select between the top x moves, instead of just the best one - and increase or skew the spread of the randomness depending on how "weak" you want the machine to play. At the highest level, it will always select the 'best' move. At the lower level, it will be say 50% as likely to select the 6-8th best move and only 10% or less to select the best move. This would avoid the risk of selecting absolutely worst moves which drop pieces and hangs pawns without reason.

Thanks for addressing this point - if it can be worked out it may make rating easy - just play against a set of computers - they are super consistent, never tire, never favor one opponent over another and don't get depressed. Play a short series and you can get a rating.

-- Julian Wan
Ann Arbor, USA

Thanks for your comments. Your first suggestion is a good one; in fact, it's been done. I have a few older machines in my collection which play in exactly this manner. The problem was that they played a miserable endgame, especially when they were winning. They might have a Rook + King advantage against a lone King and be unable to successfully perform the old "reduce the square" technique required to win such an endgame. So the poor user's misery just drags on and on until the 50-move draw occurs. But increasing the depth of search in such cases might work better; here again the trick will be isolating the circumstances under which such code would be triggered.

And your second point is a possible solution to the problem I raised in my response to the first e-mail above. I like the randomization approach and fooled around with it a bit (back in the 80's) as a means of generating "opponent's" moves (general plans, really) in solitaire wargaming. That might work here; I guess the programmers will try that and see how it flies. -- SL

-----

Creating a reliable handicapping function is really quite easy. Yes, even with 1 and 0 binary logic.

Presently a computer stays in opening book for 'x' moves then starts calculating. At optimal settings, the machine play from its opening book and all subsequent moves are those that calculate best within the time allowed.

Let's analyze this probabilistically. The above paragraph describes a scenario that if sketched out looks like this:

If opening book move available, play book
Else play best calculated move available

You'll note that both the If and the Else conditions are probabilistically 100% outcomes. If there is a book move available, it will play a book move with 100% assurance. If there is not a book move, it will surely play the best move it can calculate within its alloted time and evaluative algorithm, 100% of the time.

To handicap a computer, all you need to do is make these 100% outcomes something less than 100%.

To give an example, let's say the computer is out of book and in a position with 22 different legal moves. All a computer has to do is assign probabilities to these 22 moves based on how they evaluate, with the best moves receiving the highest probability of being played and the worst having little or no probability. The computer would array the legal moves and their evaluations into an array, and the further a move was away from the best evaluation, the less likely it would be played.

Getting a computer to play at a less-the-optimal ELO would be a simple matter of increasing the probability of weaker moves being played. It would be fairly easy through statistical trial-and-error to figure out how much this increasing probability of sub-optimal moves being played would correspond to different ELO levels.

Properly done, at 1700 ELO you wouldn't hang a knight or ever commit a gross blunder except under time pressure--the probability of moves of that nature happening would still be close to zero. Even at 1200 ELO you wouldn't see gross errors, like hanging a major piece, but the randomness would be high enough that little errors would be committed all the time.

Finally, you would of course scale back the strength of the opening book in accordance with ELO. At 2000 ELO the book might go 16 plies, at 1500 ELO it might go 12 plies, at 1000 ELO maybe 8 plies.

Seems simple enough to me!

-- Nelson Hernandez
McLean, VA USA

Thanks for your response. I think everybody's been pretty much on the same page so far, with slightly different spins on the implementation.

The 1-0 approach is pretty standard for programming. I dabble in robotics programming and that's pretty much how you tackle any program decision: "If this, else this". I sometimes get hung up when I toss multiple responses into the mix: "If this, then this, else if this, then this, else if this, then this" in a hierarchal structure. The monkey wrench hits the works when a situation comes up which you didn't forsee and isn't covered by your "if" decisions -- then the robot just sits stupidly idle. Happens to me (sadly) all the time.

So you have to make sure that the probabilities of all candidate moves add up to 100. Simple enough, but what troubles me is the time required to initially assign those probabilities. Chips are greasy-fast these days, but it's still going to require some level of time to assign the probabilities. That's typically not a problem, but what happens when the user is playing a blitz game in handicap mode and the program finds itself in time trouble? Those milliseconds suddenly increase in value. So you might try turning off the probability feature when the clock reads less than x, but then you get the old "the program's playing at full strength" complaint (see the first letter above).

To me, though, the biggest hurdle in all of the responses so far lies in identifying what makes a xxxx Elo player a xxxx Elo player. What distinguishes, say, a 1700 Elo player from a 1900? Or, to split the hair still finer, what separates a 1900 from a 1925? (And if any reader says "Easy -- 25 points" I'm making them stay after school to clean the erasers). As I recently wrote in an article for another website, I've known Class D players who were superb endgame players but had trouble just getting to an endgame -- the game was usually over somewhere in the middle. And I've known Class A players who could play a middlegame like a house a-fire but, if they failed to finish off the opponent, couldn't play an endgame to save their souls.

It's a very knotty problem and I have no idea how to even start addressing it. -- SL

-----

Hello, I read with interest the Steve Lopez column on methods and difficulties in making a chess program play at less than full strength.

He covered several approaches, but not the one that immediately comes to mind. A chess program at full strength plays the move that leads to mini-max highest value of its evaluation function. If we want to handicap the program, let's have it play a move that leads to a little bit lower value, than the highest-valued move, the lower the level of the program, the higher the discrepancy between the move it plays and the best move. If, for example, all moves except one immediately lose in some position, then the difference between the best move and any other will be so big, that in all handicap levels the program would still make only the best move. But if in a position there are a variety of moves of progressively weaker quality, they would be played depending on the level of handicap.

Why not do it this way? Where am I erring?

-- Mark Galecki

Thanks for taking the time to write. OK, we're all saying essentially the same thing here: find a way for the program to assign probabilities to the responses and choose (randomly or not) among the sub-optimal candidates.

But, presenting us with the fly in the ointment, is my good friend "The Prince of Hamburg-Harburg (Upper Side)":

BTW this doesn't work. If you set the error range too narrowly it makes zero difference in games against amateurs. They get killed in exactly the same way. If you set it too high the program makes stupid mistakes that even human amateurs wouldn't make.

And he also informs me that one of our programmers is currently experimenting in this area.

OK, this might be a long reply (and reiterates some of what I brought up in my previous column), so pop a cold one, settle back, and I'll try to make this as short as I can.

I think the problem here is twofold: the level of the human opponent and the wide range of perception within the target user audience.

New players just want to win; losing discourages them and just runs them off. I've recently taken up Texas Hold'Em poker (and I'm doing quite well with it, thanks for asking) and I've already encountered new players who lose a series of hands and then just give up the game entirely. They might possess the seeds, the makings, for being a good player, but they haven't yet discovered that losing is an opportunity for learning, so they just hang it up.

New chess players are the same way -- most of them just want to win. Hey, nobody started out as a Bobby Fischer or Amarillo Slim (not even Bobby and Slim); you have to take your lumps on the way up.

Intermediate players don't mind losing, but they'd much rather lose a close game than be blown off the board. These, I believe, are the players the programmers have in mind when they create handicap levels.

Believe it or not, I think higher-rated players (at least the ones I know) actually expect to lose to a computer, but they're still not fond of losing to a baroque seventeen move combination that wins a pawn (and I've seen Fritz spit a few of these out).

So it goes back to what I was saying in that previous column. When a player loses to a program set to a handicap mode, it's sometimes a response of "Damn! I lost" regardless of how "badly" they were beaten. But when a player beats a handicapped program, it's sometimes a disgusted "Damn! I won" because the program made an "unintelligent" mistake (and I'll define/refine it here as a mistake which the player spots as such at the time it's played; otherwise the player has no comment).

A dozen or so years ago I used to play a guy at the local chess club who drove me crazy. I knew he was making mistakes, the kind of moves the books tell you are dead wrong, but I couldn't figure out how to exploit his errors. I owned a chess program that let you tweak it six ways to Sunday; through isolating the nature of his errors, I was able to create a computer opponent who played in much the same style as he did. Over the course of many "practice" games against the program, I was able to learn how to exploit my human opponent's errors and punish them.

So what does this have to do with the problem at hand? Everything. It's a matter of perception. Let's say I'm a chess programmer and I recreate this computer personality as my "1300 Elo" template for a handicap mode. A similarly-rated player who doesn't know how to exploit those same errors might view this "1300" player as being stronger than 1300, based on the fact that he can't beat this setting. Another similarly-rated player who's already encountered these standard errors and has developed the techniques required to defeat them will look at this "1300" setting and say "That's too easy -- it's making 'unintelligent' mistakes." Two players, same rating, different results, different perceptions.

That's gotta be hell to a programmer. -- SL

-----

We'll look at some more responses to the "Intelligent mistakes" column next time around in ChessBase Workshop. Until then, have fun!

You can e-mail me with your comments on ChessBase Workshop. All responses will be read, and sending an e-mail to this address grants us permission to use it in a future column. No tech support questions, please.


© 2005, Steven A. Lopez. All rights reserved.



Discussion and Feedback Join the public discussion or submit your feedback to the editors


Discuss

Rules for reader comments

 
 

Not registered yet? Register