Entries in whsdbcheck (11)

Tuesday
Jul142009

WhsDbCheck - 1.0.0 Build 15 BETA

Notes:

Well here it is, the new build with extended error information. I should warn that this build features sweeping changes in the code and therefore might not be as stable as the last build, but you never know.

So what does extended error information mean? It's really a twofold change:

  1. Errors are now reported with much more detail (with extended ASCII). They all feature extended descriptions telling you exactly what went wrong, and in many cases have relevant data from where the check failed, such as a file offset.
  2. Before this version there were 2 types of errors, an Exception and a check Error. When encountering either the check would abort and both meant that your data is most likely corrupt. Now there is a new type of error, well it's not really an error, it's a Warning. A Warning means that there was an inconsistency found within the database files, but the problem does not directly affect the integrity of your data. Unlike an Error or an Exception, when a Warning is encountered it will be noted on the screen, but the check will continue to run. All Warnings, Errors and Exceptions will be displayed at the end of the check in all their glorious ASCII detail.

As you can imagine, both of these required a severe overhaul of every check’s logic and were pretty significant changes.

So in summary, WhdDbCheck tries to be as detailed as possible as it performs hundreds of checks on the backup database, depending on the size of your database really. But it will only register an Error if it is sure that the problem detected is causing data loss.

Also, you should be aware that just because WhsDbCheck considers something a Warning that does NOT mean that the Windows Home Server backup engine will be able to open the database. That will depend on how resilient it is to database problems.

So what use is a Warning you might ask? If you can’t open the data with the Windows Home Server, then shouldn’t that be an Error? Yes, it should. Except that the next utility that I will release is called WhsDbDataDump which will be able to open up damaged databases and extract backed up files from them. So that is how I decided what to make an Error and what to make a Warning. If WhdDbDataDump will be able to get the original data with 100% integrity, then it’s a Warning, otherwise it’s an Error.

In general, if a database is registering any Warning/Errors then it should be considered compromised. But if it’s a Warning, then you did not technically loose any data.

As for WhdDbDataDump availability, it’s in the planning stages now, and I will only give a very rough estimate of months until the first build surfaces.

There will be another build of WhsDbCheck soon adding another crucial feature and maybe fixes to .

Changes:
  • New status presentation.
  • Each check can now have warnings associated with it. A warning will be reported but will not stop the test. Add warnings will be re-listed at the end for the run.
  • At the end of the run, all errors/warnings/exceptions will be reported with an extended description indicating what went wrong and advice on how to remedy the situation.

Download 1.0.0.15 BETA

Tuesday
Jun302009

WhsDbCheck - 1.0.0 Build 14 BETA

Notes:

This release completes the underlying progress reporting changes. Most of the changes are under the hood and invisible but give me the ability to tweak things on a more fine a level. The estimated time should be much better in this release, although at this point I'm sure it's not perfect as I tweaked it by hand.

So at this point the progress reporting message gives the overall test completion percentage, the estimated time remaining until the entire test completes, the current step, the total number of steps, and finally, the current step's progress. Note that the estimated time left will change dynamically with system load. It has a memory of sorts, so past slow downs will affect the estimate. But the memory is not indefinite, it will completely forget of a slow down after a few minutes. So effectively, you can say that it always tries to compute the time remaining based on the recent rate of progress.

Next comes the status / error reporting overhaul. The idea is to have much more meaningful error messages for every check failure. This actually inspired the latest flurry of builds anyways, so it will be good to get that done.

Changes:
  • Better unreferenced file check that checks each individual backup file and is capable of reporting exactly which files are unreferenced/uncommitted.
  • Progress estimation extended to be more fluid and extensible. This also corrects an issue with the last build's progress estimation on multiple Level 4 passes.

Download 1.0.0.14 BETA

Monday
Jun152009

WhsDbCheck - 1.0.0 Build 13 BETA

Notes:

Re-architected the status/progress reporting for better integration into future products. Fixed a bug that was causing cluster checks to pass on incorrect data in some rare cases (Level 2+). Put in an explicit int parse check. Before this, we would throw an index out of range exception on corrupt data, now we get a more informative exception. Again, this only happens if data is corrupt. Tiny bump in overall performance.

Be aware that this version is the first to display overall progress, along with an estimated time to complete the whole check. The time can be somewhat inaccurate at this point, I still have some work to do there that didn’t make it into this build. In particular, the time can shift when changing check types. Even though this shift happens, the estimation algorithm is an adaptive one and should catch on after 30 to 60 seconds and re-compute.

Changes:
  • Better status reporting architecture. Now allows for console color output.
  • Added /pause switch.
  • Better progress reporting architecture with overall progress report and less scrolling.
  • Fix for index cache. Can cause cluster checks to pass even though index was out of range. Now, cluster checks will fail correctly.
  • Added Windows home server int overflow checks.
  • Tiny bump in performance by optimizing out one loop in int parsing routine.

Download 1.0.0.13 BETA

Friday
Jun122009

Rehashing the Hash Cache

In the last post I’ve explained what the hash cache is and what it’s for. So now let’s talk about the new /keephashcache switch. The hash cache itself is written to a file in temporary sub-folder, every time a level 4 check is performed. This is done once per index file and takes a significant amount of time.

Generally speaking, a level 4 check works in 2 simple steps:

  1. Data is read and hashed.
  2. Data is verified with stored hashes.

Step 1, which takes much longer, creates the hash cache and step 2 uses it. So in order to save the time that it takes to perform the reading and hashing for any subsequent checks, we can save the hash cache and reload it next time. That is, as long as the database has not changed. If it has, you must rehash, or else the hashes will not line up with the actual data and the check will fail.

In order to accomplish this, run WhsDbCheck with the /keephashcache switch. This will create additional files in the backup database folders. Now, if you run WhsDbCheck again, it will pick up the hash cache file and use it instead of performing a fresh read and hash. Instead, it will perform a simulated read which is much faster.

If you want to perform a fresh level 4 check, you will need to delete the Index*.md5 files created by this switch. They will be located in the backup database folder.

All in all, use this with care, because it can create a situation where your level 4 check will fail for seemingly no reason.

Up next, an explanation of the different check levels.

 

Wednesday
Jun102009

The Hash Cache

In build 11 of WhsDbCheck I introduced the new /keephashcache option to preserve the hash cache from run to run in order to save the reading and hashing time done the first time. I think this option deserves a little more explanation. But before we do that, let’s start with what the hash cache actually is.

When the windows home server stores data, it sores a hash of each cluster along with the actual data. A hash is a small number that can be used to absolutely verify the integrity of a much larger data chunk, any size in fact, for those who don’t know. In the home server each hash is typically used to verify a 4096 byte chunk of data, although this can be variable. I use this hash in a level 4 check to make sure that every single byte that exists is the backup database is the same exact byte that was written at backup time.

I don’t suggest anyone try this, but I did :) You can change a single byte in a data file and see the level 4 check detect the fault. Very cool stuff. Oh, and I mean it, don’t try this unless you have a backup of the database!

So here’s the problem though. In order for the check to be anywhere near reasonably fast, we’re talking hours instead of days or weeks here, we need to load the hashes into memory.

Let’s do some math:

  • If you have a 300 Gigabyte database.
  • 4096 bytes per cluster (this is typical for NTFS).
  • That means you have 78,643,200 hashes.
  • At 16 bytes per hash that would require 1.17 Gigabytes of RAM.

The first generation HP MediaSmart server comes with only 512 MB of RAM. So clearly, this is not going to work if you just load everything into RAM.

After much experimentation, in order to make a level 4 check possible I came up with a clever technique of perform this check in a reasonable amount of time. I call this the hash cache.

Essentially, the hash cache has one important quality, it avoids hard drive seek times at all costs. It turns out that drives and file systems are pretty good at reading a sequential stream, as long as it stays sequential. This is what the hash cache does, and it works well.

So this is where the /hashcache switch comes in. By default, the hash cache uses 256 MB of memory, but you can change that by specifying a new size with /hashcache=(size in MB). E.g. /hashcache=1000. To make the hash cache ~1Gb. This will make a level 4 check faster if you have the RAM. Specifying 0 will turn off the hash cache (not recommended), and specifying a value more than your RAM will be really bad for performance. Specifying a value more than a couple of gigabytes will slow things down too.

Now that I’ve explained what the hash cache is, in the next post I will talk about the new /keephashcache switch and how and when to use it.