Still more responses to Intelligent mistakes

For those readers coming in late, we've been having a pretty interesting dialogue in ChessBase Workshop -- we're discussing techniques for making a chess engine play at less than full strength while attempting to mimic (or at least approximate) the play of an average chessplayer. The topic has generated an enormous amount of e-mail response from readers; we'll look at a few more of these responses in this column.

A quick heads-up: some of it's going to get fairly technical this time around.

As in the last column, my responses are in italics following each reader's letter.

-----

I believe one of the first things necessary in creating a computer program that plays "human chess" is a model of what human chess is. A statistical benchmark of human players from grandmaster to unrated. There are at least three areas that can be used as a starting point for evaluating human play:

1. Statistical Error Map (SEM)
If we assume the computer's best move has a zero rating and all other moves are more negative, we can look at the moves of for example 1300 level rated players and determine how often and at what level they make good and bad moves. Statistics can be generated for each rating category. SEM can be used as a guide in the computers move selection process. SEM would permit the computer to statistically mimic the desired rating category.
2. Cumulative Error Map (CEM)
If you imagine an XY graph "ERROR LEVEL vs MOVE NUMBER" of a game. The slope and level can be evaluated for each rating category. A weak player's line will slope downward quickly while, a strong player's line will stay horizontal with little downward error. The line can't move upward because the best move you can play is the computer's number one evaluation move ( zero error ). Once statistics are generated CEM can guide the computer toward the level of error for a particular rating category as the game is being played.
3. Ply Filter (PF)
When a player makes a mistake, it's typically not a one mover. It usually requires some level of calculation to capitalize on the error. Games from different rating categories would have to be analyzed to determine not only that a pawn was blundered but, what level of search was required to see it. Statistics can be generated for each rating category on the frequency and depth of player error. PF would allow the computer to drop a pawn like a grandmaster or amateur. This also holds true for good moves. PF would not allow a computer set to play as an amateur a twelve move combination that wins a pawn.

I think this is less an issue of computer programming and more an issue of statistics and definition. Until were can define human error by frequency and depth, we won't be able to mimic it accurately.

Robert Ranney
Bridgewater, NJ USA

Robert, thank you for writing.

Wow. Dang!

I'm a "history guy". Math was always my worst subject, which is why the TV series "NUMB3RS" sometimes has me scratching my head (and, by the way, if you haven't caught it yet, watch it! It's an incredibly cool show. But I digress...). But your explanation is crystal clear. And it ties directly to what reader Philip Johnston offered in last week's ChessBase Workshop. Philip suggested that a way to tackle the problem would be to determine the reasoning average players use when selecting their moves. I countered that the information wouldn't be easily mathematically quantifiable. But your suggestion provides a methodology for at least evaluating the raw data and possibly even for starting a means of implementing it within an engine's algorithm.

I repeat: Wow! Dang!

And your explanation is killer, truly (when I forwarded it to my friends at ChessBase, I described it as "kick-a**"). It sounds like it could have come from Charlie's mouth on "NUMB3RS": perfectly clear and understandable, even to those of us who are terminally mathematically-challenged. I do hope it's something the programmers can use. Thank you for sending it! -- SL

-----

Hey, I just finished reading your article and the feedback from your 'intelligent mistakes' column... I don't really know much of anything, but I was surprised that no where in the column did you suggest that maybe the computer should just evaluate diffrently. From what I understand an engine basically makes moves, evaluates the board, makes more moves... evaluates who's winning... etc... depending on the depth... eventually decides which of the evaluations gave the highest 'score'... I'm curious as to how a computer looking at any position can tell you who's winning... and by how much... values of the pieces, but I assume the good engines also look at things like doubled pawns, bishops vs. knights, king safety, piece mobility, colour of the bishop etc... Alot of players consider those things as well, but not all. As a 1700-ish player, I don't look at the colour of my bishop to decide my moves, rarely will my King's future mobility affect my decision, these ideas are all in the back of my head but not really factored in when making 'a' move. So why not just take that away from the computer? Tell the computer that a knight is as good as a bishop, tell him not to worry so much about doubled pawns, about his king safety etc... Does a computer look at a queen as 9points? or is it 9.1... or 8.65 depending on everything else... Just take that away from him, If you limit the in-depthness of his evaluation... and Maybe incorporate this with some of the other suggestions, such as randomly choosing between the best of these 'limited analysis' lines... Just an Idea

Piotr Pisanski

Thanks for writing, Piotr. You bring up some interesting ideas, some of which have been implemented. For example, the engine parameters of several ChessBase engines (most notably Shredder) allow the user to tweak some of the positional criteria you mentioned. As for a criterion such as Bishop color, this (as you indicate) has more to do with long-range planning than short-term candidate move selection. It specifically deals with color complexes, which is a motiv which has heretofore been badly covered in English-language chess books (when it's mentioned at all); the ChessBase series of training disks on "Squares Strategy" covers it, but using a method which many users find a bit opaque (judging by the comments I've received).

Revaluing the pieces is a feature available in that other chess program (which I won't mention, lest I get another e-mail or message board post "threatening" legal action from one of its representatives -- heh). In that program, the user can change piece valuations to his heart's content. I suspect (but I don't know this for sure) that the "Sacrifices" engine toggle in Junior does something similar in a way that's transparent to the user. By valuing a Queen the same as a Rook (i.e. five pawns) for example, an engine will be much more likely to sac the Queen (either actively, by initiating a swap of Q for R) or passively (by not defending against a R for Q trade threatened by the human player). -- SL

-----

You asked how computers should play fallibly even if they have endgame tablebases.

I defined a Reference Fallible Endgame Player in an ICGA_J paper v26.2, after giving the ideas a first airing at the 7th ICGA Computer Olympiad Games Workshop (Maastricht, 2002).

Then, thanks to the excellent participation of Rafael Andrist in further developing WILHELM, we were able to demonstrate the efficacy of a range of Fallible Endgame Players in two further papers (Graz, 2003 - and TCS_J to appear in December 2005). We showed fallible winning endgame play in KQKR and KBBKN. The model extends to fallible defending play and to players that might concede the theoretical value of the position rather than just retain it.

The central idea is that an RFEP with zero skill picks a move at random, but an RFEP with positive skill is more likely to pick a better move than a worse move. The more competent the RFEP, the more likely it is to pick the better moves - ranging up to an RFEP with ‘infinite skill’ which always picks the best move.

See this web site for pdfs of the papers.

A footnote suggested that a similar idea could be used to give probabilities to the engine choosing moves giving an evaluation of X. In this way, the move with the best X would not always be chosen, but would merely be more likely to be chosen than the others. Of course, the more the ‘X move’ is clearly the best, the more likely it would be to be chosen, and the greater the ‘competence factor, c’ of the Reference Fallible Player, the more likely the ‘X move’ would be to being chosen.

I haven’t worked out the mathematical details, but some fairly trivial modification of the RFEP competence formula would suffice. Probably just needs a minus-sign in there as we wish to favour higher evaluations instead of disfavouring higher resultant depths of win. In fact, it’s rather like a fallible defender favouring deeper lost positions after their move to shallower losses.

Guy Haworth

For those who don't know his work, Guy Haworth has more computer chess credits than you can shake a stick at -- he's a recognized giant in the field.

Consequently, Guy, I'll have to take your word for it. I'm not a chess programmer -- I'm probably charitably described as an "somewhat informed layman" (just enough to be dangerous to myself, if no one else). But I will check out those papers, even though I suspect I'll be left scratching my mathematically-challenged head.

Thanks for writing! -- SL

-----

I'm writing to hopefully shed a bit of light on reasons for the difficulty of having a computer play sub-optimal moves and so on in an apparently intelligent fashion, which seems to be a matter of some confusion for many of the respondents to your recent article. I am a computer programmer but not specifically of chess computers although I've spent some time researching the subject as one of those side projects that never seem to get off the ground.

I think the main reason (and this may sound a bit hokey to the unititiated) is that chess computers don't actually really have such a deep range of moves, or perhaps rather move making criteria, to fall back on. The main problem that chess computers have is that they don't know anything about positional principles (this assertion is there to be shot down, it is a simplification, but it certainly bears little resemblance to how a human would assess a position and therefore contributes to a very different play), they don't make certain moves in the firm knowledge that in 30 moves time the pawn formation on the queens side will give them a decisive end game advantage, they work everything out using "brute force" technique.

The problem with this technique is that a chess computer has to work out every single move from the position, it can't just discount certain moves as 'bad' as humans regularly do (usually correctly, but sometimes mistakenly) using some mystical 'positional judgement', it has to look at each move and see that they are bad. Now the maths of this gets astronomically huge very quickly, the famous stats being that the number of different games of chess it's possible to play up to move 40 is greater than the number of atoms in the universe - basically no computer is going to work this out any time soon.

It's also important to recognise just how computers 'think' about positions - it's nothing like a human thinks about them. The whole thing is deeply abstracted, and reliant on shortcuts that can be made by using certain hardware (for example a 64 bit processor is a significant processing speed advantage, and no it's not a coincidence that 64 is equal to the number of squares on a chessboard), and other specialist techniques developed over decades.

As a result of this massive amounts of programming effort go into optimising the decision making process to only find the good moves, and for each good move there is a vast sea of mediocre moves. Letting the mediocre moves in would be catastrophic for the decision making process and it's this imbalance of scale in relation to the established techniques that makes it difficult.

This brings me to what I think is a relatively logical conclusion, which is that the techniques for chess computers up to now have been based on decades of just trying to make a computer good enough to beat a good human player. It's only a recent development that home computers can wipe the floor with 99% of club players, and that the most powerful computers can now expect the human champions to be the ones scrabbling for a draw.

However rather than be downbeat about the human position I think this really opens up the next big challenge for computer chess, to create chess computers that really can play in a truly human like way in a sort of chessic Turing test. I don't think it's wrong or sad for people to ask for this, you can't always have a good human to play against no matter how gregarious a chess player you are, I think it's fair to ask to play against a computer without getting that sinking feeling when you realise the tight battle you're having has transformed into you being put to the sword by the computer deciding it's evened things up enough. However for those calling for such programs it's not as simple as they think, and really would require some subtle thinking on the part of the programmers to create it from the current programs, if not a completely new approach and set of techniques.

I hope this has been a helpful comment

Jon Braund

Thanks for writing, Jon!

You bring up a nice point: humans play strategic/positional chess by a series of learned/acquired shortcuts (sometimes called "heuristics"), while computers play by some form of "brute force" calculation (after pruning the search tree). From having studied endgames, I can look at a late middlegame position and determine whether or not it's advantageous to trade off the last couple of pieces and reach a pure pawn ending, knowing from experience (and acquired knowledge) that the resulting pawn structure will be a win for me if I don't misplay it. I can't tell you exactly how many moves it will take to enforce the win, but I know (intuitively? instinctively? from pattern recognition?) that the win is there. I'm not a strong player by any stretch of the imagination, just an average club player, but I've won many games in just this way.

If there are several pawns on the board for each side, computers can't do this, even using tablebases. You can't program nebulous "pattern recognition" heuristics into a computer program -- at least not yet anyway. It's all brute force calculation, assigning numerical values to board positions (although mathematically-quantifiable heuristics are certainly a part of that evaluation process) and "mini-maxing" to a decision. It's antithetical to the way humans play -- humans aren't computers and vice-versa -- and that's probably the big stumbling block right there.

Thanks again for an insightful letter. -- SL

-----

I've enjoyed your ChessBase articles over time and took particular interest in the intelligent mistakes article.

I'm an avid chess fan and I'm always looking for a game. I manage to win games against most of my peers, but I'm far from a strong player. I own Fritz 8 and I totally agree that it is quite hard to set it up so that it feels like I'm playing someone close to my own strength.

I'm also a programmer and software engineer, so I can appreciate both sides of the fence. The obvious potential solution is what has already been suggested. Get the computer to choose the second best move, etc. The main problem with this approach is that the algorithm the computer uses is designed to go for the jugular. It is designed to find the best move, full stop. In effect, it doesn't scale down very well, so it feels very strong if allowed to do its own thing or weak if interfered with. This is further compounded by the fact that it uses the same algorithm to try and work out what its opponent will play, which of course is not very realistic.

Peeking under the hood, all computer software operates on the same principle, it uses speed to search possible moves and for each position it applies an evaluation function in an attempt to determine the merit of that position. Even with today's super fast computers, the number of positions which can be evaluated is a drop in the ocean compared to all the possible positions, so the evaluation function is the key and also the most complex piece of code. It is this evaluation function that you can tweak by changing values for different pieces, modifying king safety parameters, and so on. However, while adjusting these parameters can have a tangible effect, the evaluation function remains unchanged, it just operates with slightly different numbers.

The reason that restricting the computer's ply depth or horizon doesn't work so well is that it just invites blunders as the computer becomes unable to 'feel' the position. Similar to the way that human players know what the right move is without being able to fully explain their reasoning.

In my opinion, what is needed are a series of *different* evaluation functions that have been tailored for specific playing strengths. So instead of trying to cripple the best evaluation function in order to make it weaker, you could use an evaluation function that was designed to be weaker. I believe that subtle weakness could be much better simulated with a purpose built evaluation function, so instead of giving away material by making silly mistakes, it would perhaps show some positional weakness or it might avoid complex theoretical lines. Further, it might also make moves that could be exploited by a clever combination as long as it determines that a human would be unlikely to see it. The sort of 'mistake' that a human player makes, which is rarely exploited because the opponent fails to see the continuation. If these tailored evaluation functions can be further parametrised by the user, as it is possible to do now, then much could be achieved.

Some of the ideas above have been seen in the top programs, however only to increase the strength of the program, where perhaps the program chooses a complicated line over a slightly stronger, but clearer line.

Finally, I think a separate evaluation function for guessing the human opponent's moves would also be very helpful. This would allow the computer to downgrade its play somewhat. Let me explain. At the moment, the same very strong, very artificial evaluation function is used to guess what the opponent will play. Since a strong move will be guessed the computer will try to come up with a strong reply, so its prophylactic play will be much stronger. By having a separate evaluation function for the opponent a whole new level of flexibility is achieved.

At the moment we can customise the computer and perhaps model a particular human opponent's weakness, but I'd like to be able to model *my* weaknesses for the computer, say a preference for the queen and less than excellent endgame. If that was possible, then it would be possible to improve by exploiting the 'risks' the computer takes in order to cash in on your weakness. Just like a human opponent might do after they played many games against you.

Gabriel Ditu
Sydney, Australia

Thanks for an interesting e-mail, Gabriel!

I like the sound of your "tailored algorithm" approach: separate algorithms (in effect, separate programs) for different playing levels. They can be implemented/integrated into a program using a "block" system. As an example, I program virtual combat robots -- autonomous fighting machines which battle within a computer environment. I've created several robots which detect physical characteristics of the opponent and then react accordingly. The first A.I. "block" is the detection block, in which the robot's sensors detect physical characteristics of the other robot. Each "Y/N" decision within that "sensor block" is keyed to a separate program block later on in the A.I. code -- if Condition "A" (for example, that the opponent is more than 12 meters tall) applies, my robot will follow only the instructions in the later block which that "Y" decision is keyed to, and my robot will ignore all other blocks in the code. In other words, I can write many separate programs under the umbrella of the bot's single set of A.I. code, calling up the appropriate sub-programs as needed depending on what the robot's sensors detect.

Perhaps it's possible that a chess program could be blocked out in much the same manner, with separate sub-programs for Elo groupings (say, one block for 1100-1200, another for 1200-1300, etc.), which ties in nicely with the research project to determine the salient characteristics of each Elo grouping's play (mentioned in last week's column). The individual blocks could be triggered by a setting within the user interface (i.e. the user selects the level of the opponent he wants to play before the game starts) or by a handicapping system such as the current Friend mode, in which the player's prior results will determine which set of code is used by the engine. -- SL

-----

a fascinating subject

I am a reasonable club player who has no chance against Fritz but would love a computer that played realistic club chess.

I have tried the lower settings on Fritz and find them either still far too good or awful. with nothing in between.

whatever the answer is, I hope the boffins find it soon.

Larry Millington

Well, Larry, that's why we're hoping to achieve here by kicking around some ideas (whether or not they're practical or even possible); maybe some of the "outside the box" suggestions will spark a programmer's curiosity and some new techniques will be tried. -- SL

-----

We're caught up now; that's what I've received so far in the "intelligent mistakes" discussion. Next time out, we'll look at some user comments and questions on other topics. Until then, have fun!

You can e-mail me with your comments on ChessBase Workshop. All responses will be read, and sending an e-mail to this address grants us permission to use it in a future column. No tech support questions, please.

SHOP

SHOP

Still more responses to Intelligent mistakes

ONLINE SHOP

The Neo-London-System - The sharper alternative to the popular London System

© 2005, Steven A. Lopez. All rights reserved.

Discuss

Fritz 20

Silence the Sicilian - Win with the Alapin Variation (2.c3)

Queen's Gambit Accepted Powerbook 2025

Queen's Gambit Accepted Powerbase 2025

Fritz 20 & Opening Encyclopaedia 2025

Master Modern Opening Strategy: Flank Attacks against Classical Openings

Rossolimo-Moscow Powerbase 2025

Rossolimo-Moscow Powerbook 2025

Pop-up for detailed settings