Database basics - part 8

by ChessBase
10/13/2004 – In the final installment of "Database Basics", columnist Steve Lopez discusses the finer points of speeding up your database searches and offers some closing thoughts on why using a database is important for your chess development. Read about it in the latest ChessBase Workshop.

ChessBase 17 - Mega package - Edition 2024 ChessBase 17 - Mega package - Edition 2024

It is the program of choice for anyone who loves the game and wants to know more about it. Start your personal success story with ChessBase and enjoy the game even more.



by Steve Lopez

We're pretty close to the end of the line with this ChessBase Workshop series on database basics; I've saved the toughest stuff for last. We might get a wee bit technical with this closing article, but I promise to try to make it as painless as possible.

I want to talk a bit about the speed of database searches and give you a few tips on how to rev them up a bit. There's no "magic tweak" involved, no esoteric computer jargon to master -- it's just a matter of using some common sense when setting up your search criteria.

We've already discussed the "less is more" phenomenon: the less information you provide in the Search mask, the more data you get back when the search is complete. There are a few practical applications of this phenomenon that we can use to our advantage.

The first application involves the knowledge that there's a hierarchy to how your chess program (ChessBase or one of the Fritz family of playing programs) performs a search. If you create a search that involves both header information (the stuff under the "Game data" tab of the Search mask) and either the "Annotations" or the "Position" tab, the program will look for the header info first before looking "inside" the game itself for annotations or a position. So, for example, if you perform a search for a particular board position but limit the search to, say, a particular ECO code, the program will start scanning the database and look at just the ECO codes in the game headers. It will completely disregard any game that doesn't match the ECO code you entered, but when it locates a game of that ECO designation, it will then look "inside" the game for the position you entered in the Search mask.

You might be thinking, "Big deal. So why is this important?" You can check this out for yourself. Play a variation of the Ruy Lopez, let's say eight moves deep, on a board and then use the "Get board" feature (previously discussed in this series) to transfer it to the "Position" tab of the Search mask. Start your search. If you have a database of more than two million games, go make yourself a cup of coffee and come back in a few minutes -- it's going to take a while for your program to complete the search. Why? Because it's searching every game in the database for that board position.

So how do we speed this up? You'll remember that you entered an eight move variation of the Ruy Lopez; while it's possible that the position might appear in a non-Ruy game by some bizarre transposition, it's really not very likely. So you can speed up the search by using the "ECO" field under the "Game data" tab to limit the program's search to just Ruy Lopez games. In this case you'd enter "C60" in the first ECO field and "C99" in the second ECO field. Now the program will totally ignore any game that's not a C60-C99 game and look for your desired position only in games that fall within that range of codes.

You can refine this even further. Let's say that your variation and position were from the Ruy Lopez Exchange. You could use the codes C68 and C69 to speed the search up even more.

There's another way to crank up the speed still further. You'll note under the "Position" tab that there are fields for "First" and "Last" move numbers. Since you entered an eight-move variation, you'd want to set the "First" field to "7" and the "Last" field to "9". [1] Now the program will look at just C68 and C69 games, and examine only moves 7 through 9 for your desired position (instead of moves 1 through 40 as it did when you used the default values). If you try this, you'll notice that the search time is reduced drastically compared to when you did just a position search without any change in the range of ECO codes or move numbers.

[1] I like to "buffer" the values by a move or two either side of the actual number of moves in the variation. This allows the program to catch the all too frequent transpositions which can occur a move or two sooner or later than the move number of a desired variation.

So if you find that your Position or Annotation searches are taking too long, you can speed them up a bit by adding something in the "Game data" fields to limit the search. This allows you to use the reverse of the "less is more" principle to make that concept work to your advantage.

There's almost an art to setting up successful searches. You might remember a game from your database annotated by, say, Kasparov, in which he included a text note about a "Tal-like sacrifice". Instead of just doing an annotation search for that phrase, you can cut down the search time by including "Kasparov" in the "Annotator" field. This way the program will use only games annotated by Kasparov as its "starting point" before looking "inside" the games for the phrase you want.

This technique works well for cutting down the amount of material you'll need to wade through after a successful search. For example, if you do a search for an isolated d-pawn (as we did in an earlier article in this series) you'll likely be confronted by tens of thousands of "hits" if you're searching a database of more than two million games. A good way to limit the results to something more manageable would be to include just a single ECO code for one of your favorite openings (or possibly a range of codes if necessary). If you find that you're still getting too many hits, you might consider selecting the box next to "Variations" under the "Annotation" tab (assuming, of course, that you're working with a database that contains some annotated games). This would limit the search to annotated games of a particular opening which contain an isolated d-pawn position. Here again, we're twisting the "less is more" phenomenon around and standing it on its head -- we're deliberately including "too much information" in the search criteria to purposely limit the number of hits. We're turning a disadvantage into an advantage, a principle that should be familiar to most chessplayers.

Even if you're doing a straight Position search with no applicable "Game data" criteria, you can always use the "First" and "Last" fields to limit the search to specific parts of the game. If you're looking for a specific opening position (or position fragment), it's silly to have the program search through every position from moves 1 through 40 when moves 4 through 10 would do nicely. The same thing applies to endgames -- if it's a Rook and pawn ending, searches that include moves before move 30 or 40 are really just a needless waste of time (unless you really want to find "demolition derbies" in which the players hoovered the pieces off of the board early in a hell-for-leather rush to get to the endgame).

This brings us to another important point: searching for complete middlegame and endgame positions is generally a waste of time. While the game of chess has a finite number of possible positions, that number is still astronomical. I can tell you from experience that it's highly unlikely that a particular middlegame or endgame position from one of your games has appeared in grandmaster or mater play. This has nothing to do with the "quality" of their games versus that of games played by those of us down here in the fishpond, but is instead a result of the near-infinite possible board positions in chess.

While I certainly don't want to discourage you from doing such a search (on the outside chance that your game position has occurred in some past game in your database), it's much more productive to look for position fragments (partial positions) in the hope that one of them will be close to the position you're researching. For example, I was involved in a Rook and pawn endgame in a correspondence game a decade ago and I was considering offering a draw -- I couldn't see a way for either of us to gain an advantage in that particular position. I did a database search for the exact position and came up with nothing. But when I limited the search by creating a position fragment (of just the Kings and Rooks on the same particular squares) I came up with a game that was just one move off -- one of the pawns was one square away from the position in my own game. I played through the moves and saw that the game was indeed drawn -- so I offered a draw with reasonable confidence that I was doing the right thing.

When you do any kind of general search (such as a search for games of a particular ECO code) on a huge database and get hits, you'll likely get a lot of them. There's often too much material to look at -- even two hundred games is a whopping great amount of data. So how do you know what to look at?

I usually start by looking at annotated games. You'll spot these in the game list by looking for the letters "V" (for Variations) and "C" (for commentary) in the far righthand column. In fact, I'll often do the search a second time with "Variations" checked under the "Annotations" tab. [2] I can then play through these games and get the benefit of the expert commentary included with these games.

[2] I use "Variations" because it's not often that a game will be annotated without them. While it's possible for a game to contain text commentary without the inclusion of replayable variations, it's a really rare occurence.

If I still see a lot of games with commentary in the list, I'll then use the presence of medals as a further criteria. I'll scan down the game list of annotated games and if I see one with a medal I'll play through it. After all, the whole purpose of medals is as a device used to call your attention to particular aspects of significant games.

I don't often include medals in my initial search. Medals weren't included in ChessBase format games until relatively recently (within the last few years), so you miss out on a lot of material if you include a particular medal in your search. One exception, though, is the "Model game (opening plan)" medal. When I enter an opening variation by hand and save it into a database, I'll often insert this particular medal. Later when I'm searching for these commented opening variations I can just do a medal search for "Model game" and the program will pull these games up straight away.

Finally, to bring this series full circle, we need to look again at why database games are important. Some players, particularly novices, don't understand why they should even bother with database searches. To understand their importance, we need first to realize that chess is about more than just playing. It's also about studying and improving. It's also about more than just calculation -- it's also about pattern recognition and memory.

This is why chess books are so popular (and useful). A stronger player or teacher presents important principles in a lesson in a chess book. In 99.9% of these cases, he or she will also include examples from actual games. These examples are provided to illustrate the lesson and to reinforce the lesson in the mind of the reader.

But there are limits to printed books; to keep them at a reasonable length, the author often provides just two or three examples of the principle in action. Many times, though, these principles (Rooks on open files, fianchettoed Bishops, isolated pawns, particular pawn skeletons, etc.) are searchable within a database simply by using the Search mask as we've learned over the past several articles in this series. Let's say you're reading a chess book and come across a lesson on the value of a Rook on the seventh rank. The author has provided a couple of examples, but you'd like to see more. Just fire up your chess program and do a search for a White Rook on any square from a7 through h7 -- you'll find more examples than you can shake a stick at. Play through a dozen or two of these games and the importance of a Rook on the seventh should become firmly fixed in your mind.

Another reason for utilizing database searches is just sheer enjoyment. I have several favorite players and I sometimes like to kick back with a cold one, do a search for the games of, say, Adolf Anderssen, and play through them, just to admire the man's genius. And playing through these games can have a subtle, but important, side effect: inspiration. Playing through Anderssen's, Tal's, or Shirov's games often inspires me to look for sacrificial opportunities in my own games. I actually won a "lost" correspondence game when, inspired by Alexei Shirov, I offered an unsound sacrifice that totally threw my opponent off of his game.

We play through database games for a lot of reasons: knowledge, reinforcement, entertainment, inspiration. But there's a common thread in all of these cases -- our own chessplaying improves. Sometimes it's because we learned something new. Sometimes it's because we spotted a common theme or pattern that we later come across in our own games. Sometimes it's because we're inspired by other players to take chances or jump on opportunities that we'd otherwise have missed. But in all cases we're better players because we reviewed what other, stronger chessplayers have done in their games. And that's the reason why database searches are important.

Hopefully this series has made the process of creating and executing these searches a whole lot easier and more understandable for you.

Until next week, have fun!

Previous articles

© 2004, Steven A. Lopez. All rights reserved.

Reports about chess: tournaments, championships, portraits, interviews, World Championships, product launches and more.


Rules for reader comments


Not registered yet? Register