Not the last word on hash tables

12/3/2003 – A few weeks ago, ChessBase Workshop presented "The New Word on Hash Tables". But was it the *last* word on hash tables? Some of our e-mail correspondents beg to differ, as you'll see in this week's installment of ChessBase Workshop.

ChessBase 14 Download ChessBase 14 Download

Everyone uses ChessBase, from the World Champion to the amateur next door. Start your personal success story with ChessBase 14 and enjoy your chess even more!


Along with the ChessBase 14 program you can access the Live Database of 8 million games, and receive three months of free ChesssBase Account Premium membership and all of our online apps! Have a look today!

More...

NOT THE LAST WORD ON HASH TABLES

by Steve Lopez

Hash tables have been the bane of my existence for years. Back when I did phone support for the Fritz family of playing programs, questions concerning hash table settings outnumbered all other questions combined. And let's not talk about the abuse I've taken on Internet message boards concerning my interpretation of how to use the old hash table formula on computers with hyper-fast processors and gigabytes of RAM.

I have to laugh when I look back on the year 2003 (which, alas, is not yet over). I've been writing weekly ChessBase articles since early 1997 and they've been relatively error-free. Sure, I'd screw up a minor detail here and there, sometimes refer to an IM as a "GM", commit a typo in a gamescore, etc. But the mistakes were few and far between -- that is, until this year. Apparently I was saving them up to get them out of my system in one fell swoop. So far this year I've experienced the "statistics debacle", a mixup in .bmp files which caused me to create some chessboard graphics with the light and dark squares reversed, and a couple of other errors which don't leap immediately to mind (and it's probably merciful that they don't).

The latest (possible) error was in an article on hash tables from a few weeks ago. I'm not the sharpest knife in the drawer, but I've learned a few things while travelling life's highway. Among them are that you cop to your mistakes, fix the ones you can, and learn to live with the ones you can't. I'm not altogether certain that I did make a mistake a few weeks ago, but if I did, let me at least amend what I wrote at that time by providing some additional information now (and risk taking some lumps by doing so).

Last summer, ChessBase programmer Mathias ("1T") Feist sent me an e-mail regarding hash table size. Here's an abbreviated version of what he wrote:

There are a lot of discussions going on about the formula for the hashtable sizes. We introduced this formula for Fritz 5/5.32. It was rather important then because Fritz 5 used to deleted the hashtables between moves and the performance of the engine decreased a lot once the hashtables were full. The point about deleting hashtables is important because it takes time. You don't want to delete 47 zogabytes of hashtables before each move when you're playing bullet.

...Practically all engines today don't delete hashtables between moves, making this point moot. Basically it means that you can't do much harm by having too much hashtables. There are other reasons not to have them too big, but the effects are rather small compared to the time for deleting them and don't do much harm.

Not deleting hashtables also means that you need to have a scheme that won't hurt you if they are full. Because they are always full. You can still increase the overall performance if your hashtables are big enough for the time control, but the effect is much smaller than it was with Fritz 5. 1 MB of hashtables will still hurt you in tournament games, but the difference of 128 MB and 512 MB is much smaller than it was with Fritz 5.

OK, so far, so good. That's all pretty understandable. BUT... (as with all things in life, there is always a "but").

The problem I ran into with this entirely reasonable discourse on hash table size was that many Fritz users want a concrete answer (i.e. a number) when they ask "How big should the hash tables be?" The answer "It doesn't much matter if you set them too high, so don't worry about it" just won't cut it with some folks. (And don't laugh, but I had a few customers who would call me every time they changed time controls to ask what size to make the hash tables. They didn't want to do the math -- they wanted me to do the math for them. I'm totally serious).

So this tossed me right onto the proverbial horns of a dilemma: what should I tell people when they ask how large to set the hash tables? I pondered this off and on for months until I hit upon a "compromise solution": 32 MB or less for blitz games, and as large as you wanted without ratcheting the hard drive with disk swappage for longer time controls. I wasn't completely happy with this answer, but it was the best I could come up with.

And that (I thought) would be the "last word" on hash tables.

Not so. Greater intellects than mine have prevailed.

I received two e-mails this past week (one directly and one forwarded) from Roger Vermeir of Antwerp, Belgium (who's corresponded with me in the past) regarding proper hash table sizing. It read, in part:

I think the new recommendation forgets about one thing: above a certain hashtable size, it takes the program longer to do a hashtable lookup, than it takes to recalculate the position from scratch.

I agree; in fact, this is something I've been saying for years. And guess what? Every time I've made this statement in an article, someone has jumped all over me in some chess message board or other to make the claim that I have no idea what I'm talking about (but, for some odd reason, and despite the fact that I used to have my e-mail address at the end of every article, the comments were never sent to me directly. Now isn't that odd?). Tough -- it's the truth: when you have huge hashtables, the program spends more time sorting through them than it would spend just calculating a move right out of the gate.

I don't yet have permission from Mr. Vermeir to directly quote further from the e-mail he sent (and doing so without permission, while probably harmless in this case, would still be a serious "Netiquette" violation. I do, however, want to thank him sincerely for sending me the following suggestion), but he described how he ran "Fritzmark" tests on his engines using a variety of hash table settings. Above a certain amount of RAM allocated to hash tables (and this, I'm sure, will vary from engine to engine), he noticed an appreciable dropoff in the Fritzmark figure. Mr. Vermeir's suggestion is for each user to run such tests on his own machine to determine the RAM amount at which the Fritzmark total drops off and then use this amount as a maximum hash table setting. I wholeheartedly agree with this suggestion. While it requires a certain amount of effort on the part of the individual user, I think it's a much better way of figuring the proper RAM amount for hash tables than the not at all arbitrary, but certainly less precise, suggestion I made a few articles ago.

A possible procedural method here is to start with the "default" hash table size which Fritz (or whatever engine you're using) suggests. Then bump it up incrementally, running a new Fritzmark test at each step. When you see the Fritzmark drop off, reduce the hash table size a bit. Through trial and error, you'll eventually discover the maximum amount of RAM you can use for hash tables before you start to lobotomize your engine by making the table size too large.

This is apparently crucial for new machines with (what I consider to be) extreme amounts of RAM. A recent poster to a chess message board I help moderate mentioned that his new machine has 3 GB RAM. He's having problems with hash table settings above approximately 1.5 GB. I seriously think that's way too large, but running a series of Fritzmark tests on his machine will certainly tell the tale.

So here's the new "latest" on hash tables, for which I will doubtless again get crawled on chess message boards (hey, have at it -- I'm used to it. It comes as part of the territory when you write a column which has literally thousands of readers):

If you don't mind doing the work, use Mr. Vermeir's suggestion of running Fritzmark tests at various RAM allocations to discover the maximum amount of RAM you should reserve for hash tables when using various engines (and keep in mind that each engine will likely have a different maximum amount). If you don't want to do the work, just use my suggestion given in the article from a couple of weeks ago.

Pardon me while I bunker down and wait for the brickbats to come sailing in.

Until next week, however, I want you to have fun!


© 2003, Steven A. Lopez. All rights reserved.


Topics F8
Discussion and Feedback Join the public discussion or submit your feedback to the editors


Discuss

Rules for reader comments

 
 

Not registered yet? Register