Entries in instructions (3)

Friday
Jun122009

Rehashing the Hash Cache

In the last post I’ve explained what the hash cache is and what it’s for. So now let’s talk about the new /keephashcache switch. The hash cache itself is written to a file in temporary sub-folder, every time a level 4 check is performed. This is done once per index file and takes a significant amount of time.

Generally speaking, a level 4 check works in 2 simple steps:

  1. Data is read and hashed.
  2. Data is verified with stored hashes.

Step 1, which takes much longer, creates the hash cache and step 2 uses it. So in order to save the time that it takes to perform the reading and hashing for any subsequent checks, we can save the hash cache and reload it next time. That is, as long as the database has not changed. If it has, you must rehash, or else the hashes will not line up with the actual data and the check will fail.

In order to accomplish this, run WhsDbCheck with the /keephashcache switch. This will create additional files in the backup database folders. Now, if you run WhsDbCheck again, it will pick up the hash cache file and use it instead of performing a fresh read and hash. Instead, it will perform a simulated read which is much faster.

If you want to perform a fresh level 4 check, you will need to delete the Index*.md5 files created by this switch. They will be located in the backup database folder.

All in all, use this with care, because it can create a situation where your level 4 check will fail for seemingly no reason.

Up next, an explanation of the different check levels.

 

Wednesday
Jun102009

The Hash Cache

In build 11 of WhsDbCheck I introduced the new /keephashcache option to preserve the hash cache from run to run in order to save the reading and hashing time done the first time. I think this option deserves a little more explanation. But before we do that, let’s start with what the hash cache actually is.

When the windows home server stores data, it sores a hash of each cluster along with the actual data. A hash is a small number that can be used to absolutely verify the integrity of a much larger data chunk, any size in fact, for those who don’t know. In the home server each hash is typically used to verify a 4096 byte chunk of data, although this can be variable. I use this hash in a level 4 check to make sure that every single byte that exists is the backup database is the same exact byte that was written at backup time.

I don’t suggest anyone try this, but I did :) You can change a single byte in a data file and see the level 4 check detect the fault. Very cool stuff. Oh, and I mean it, don’t try this unless you have a backup of the database!

So here’s the problem though. In order for the check to be anywhere near reasonably fast, we’re talking hours instead of days or weeks here, we need to load the hashes into memory.

Let’s do some math:

  • If you have a 300 Gigabyte database.
  • 4096 bytes per cluster (this is typical for NTFS).
  • That means you have 78,643,200 hashes.
  • At 16 bytes per hash that would require 1.17 Gigabytes of RAM.

The first generation HP MediaSmart server comes with only 512 MB of RAM. So clearly, this is not going to work if you just load everything into RAM.

After much experimentation, in order to make a level 4 check possible I came up with a clever technique of perform this check in a reasonable amount of time. I call this the hash cache.

Essentially, the hash cache has one important quality, it avoids hard drive seek times at all costs. It turns out that drives and file systems are pretty good at reading a sequential stream, as long as it stays sequential. This is what the hash cache does, and it works well.

So this is where the /hashcache switch comes in. By default, the hash cache uses 256 MB of memory, but you can change that by specifying a new size with /hashcache=(size in MB). E.g. /hashcache=1000. To make the hash cache ~1Gb. This will make a level 4 check faster if you have the RAM. Specifying 0 will turn off the hash cache (not recommended), and specifying a value more than your RAM will be really bad for performance. Specifying a value more than a couple of gigabytes will slow things down too.

Now that I’ve explained what the hash cache is, in the next post I will talk about the new /keephashcache switch and how and when to use it.

 

Tuesday
Jul292008

WhsDbDump - Usage Instructions

WARNING: This is an advanced tool. Working with your Windows Home Server database is dangerous. I'm not responsible for any data loss that may ensue as a result of the use of WhsDbDump. Please always operate on a copy of the database and not the original one. Shuffle your bits with care.

If you're running this tool on a pre-Vista machine please be aware that WhsDbDump requires at least .NET 2.0+. You may download the latest version of the .NET framework directly from Microsoft.

WhsDbDump is a utility to dump your Windows Home Server backup database data files to human readable text files for examination. It decodes the binary format stored in the data files to .txt and .xml files. It supports the standard Windows Home Server data file format and is able to decode any database file specified.

Please note that some files are storing purpose specific data and will not be dumped. For example, the Data.nnnn.n.dat files store the actual cluster level data. That data will not be dumped.

Basic Usage:

  1. Put the latest version of WhsDbDump.exe into a folder with a copy of your Home Server database files. You can copy these files from the D:\folders\{00008086-058D-4C89-AB57-A7F909A47AB4} folder on your home server. It's safer to stop the backup service before pulling these files off. The backup service can be stopped by typing net stop whsbackup from the command prompt on the home server. You should restart the backup service after you're done by typing net start whsbackup.
  2. Execute WhsDbDump *dat
  3. Read and examine the dump files in the Dump sub-folder that was created by the tool.

WhsDbDump takes a number of parameters, but for basic usage that is all you need to know. If you want to dump the data/record sections of the backup files you can specify /data and optionally /dataalternate after the file mask (e.g. WhsDbDump.exe *dat /data /dataalternate). This will take some time to complete if your database is big and will eat up a whole lot of disk space too.