Welcome


The front page shows all the recent posts. To see a more organized view use the tabs above.


WhsDbCheck - 1.0.0 Build 14 BETA

Posted by Alex at 11:18 PM

Notes:

This release completes the underlying progress reporting changes. Most of the changes are under the hood and invisible but give me the ability to tweak things on a more fine a level. The estimated time should be much better in this release, although at this point I'm sure it's not perfect as I tweaked it by hand.

So at this point the progress reporting message gives the overall test completion percentage, the estimated time remaining until the entire test completes, the current step, the total number of steps, and finally, the current step's progress. Note that the estimated time left will change dynamically with system load. It has a memory of sorts, so past slow downs will affect the estimate. But the memory is not indefinite, it will completely forget of a slow down after a few minutes. So effectively, you can say that it always tries to compute the time remaining based on the recent rate of progress.

Next comes the status / error reporting overhaul. The idea is to have much more meaningful error messages for every check failure. This actually inspired the latest flurry of builds anyways, so it will be good to get that done.

Changes:
  • Better unreferenced file check that checks each individual backup file and is capable of reporting exactly which files are unreferenced/uncommitted.
  • Progress estimation extended to be more fluid and extensible. This also corrects an issue with the last build's progress estimation on multiple Level 4 passes.

Download 1.0.0.14 BETA

WhsDbCheck - 1.0.0 Build 13 BETA

Posted by Alex at 10:39 PM

Notes:

Re-architected the status/progress reporting for better integration into future products. Fixed a bug that was causing cluster checks to pass on incorrect data in some rare cases (Level 2+). Put in an explicit int parse check. Before this, we would throw an index out of range exception on corrupt data, now we get a more informative exception. Again, this only happens if data is corrupt. Tiny bump in overall performance.

Be aware that this version is the first to display overall progress, along with an estimated time to complete the whole check. The time can be somewhat inaccurate at this point, I still have some work to do there that didn’t make it into this build. In particular, the time can shift when changing check types. Even though this shift happens, the estimation algorithm is an adaptive one and should catch on after 30 to 60 seconds and re-compute.

Changes:
  • Better status reporting architecture. Now allows for console color output.
  • Added /pause switch.
  • Better progress reporting architecture with overall progress report and less scrolling.
  • Fix for index cache. Can cause cluster checks to pass even though index was out of range. Now, cluster checks will fail correctly.
  • Added Windows home server int overflow checks.
  • Tiny bump in performance by optimizing out one loop in int parsing routine.

Download 1.0.0.13 BETA

Rehashing the Hash Cache

Posted by Alex at 10:48 PM

In the last post I’ve explained what the hash cache is and what it’s for. So now let’s talk about the new /keephashcache switch. The hash cache itself is written to a file in temporary sub-folder, every time a level 4 check is performed. This is done once per index file and takes a significant amount of time.

Generally speaking, a level 4 check works in 2 simple steps:

  1. Data is read and hashed.
  2. Data is verified with stored hashes.

Step 1, which takes much longer, creates the hash cache and step 2 uses it. So in order to save the time that it takes to perform the reading and hashing for any subsequent checks, we can save the hash cache and reload it next time. That is, as long as the database has not changed. If it has, you must rehash, or else the hashes will not line up with the actual data and the check will fail.

In order to accomplish this, run WhsDbCheck with the /keephashcache switch. This will create additional files in the backup database folders. Now, if you run WhsDbCheck again, it will pick up the hash cache file and use it instead of performing a fresh read and hash. Instead, it will perform a simulated read which is much faster.

If you want to perform a fresh level 4 check, you will need to delete the Index*.md5 files created by this switch. They will be located in the backup database folder.

All in all, use this with care, because it can create a situation where your level 4 check will fail for seemingly no reason.

Up next, an explanation of the different check levels.

The Hash Cache

Posted by Alex at 10:25 PM

In build 11 of WhsDbCheck I introduced the new /keephashcache option to preserve the hash cache from run to run in order to save the reading and hashing time done the first time. I think this option deserves a little more explanation. But before we do that, let’s start with what the hash cache actually is.

When the windows home server stores data, it sores a hash of each cluster along with the actual data. A hash is a small number that can be used to absolutely verify the integrity of a much larger data chunk, any size in fact, for those who don’t know. In the home server each hash is typically used to verify a 4096 byte chunk of data, although this can be variable. I use this hash in a level 4 check to make sure that every single byte that exists is the backup database is the same exact byte that was written at backup time.

I don’t suggest anyone try this, but I did :) You can change a single byte in a data file and see the level 4 check detect the fault. Very cool stuff. Oh, and I mean it, don’t try this unless you have a backup of the database!

So here’s the problem though. In order for the check to be anywhere near reasonably fast, we’re talking hours instead of days or weeks here, we need to load the hashes into memory.

Let’s do some math:

  • If you have a 300 Gigabyte database.
  • 4096 bytes per cluster (this is typical for NTFS).
  • That means you have 78,643,200 hashes.
  • At 16 bytes per hash that would require 1.17 Gigabytes of RAM.

The first generation HP MediaSmart server comes with only 512 MB of RAM. So clearly, this is not going to work if you just load everything into RAM.

After much experimentation, in order to make a level 4 check possible I came up with a clever technique of perform this check in a reasonable amount of time. I call this the hash cache.

Essentially, the hash cache has one important quality, it avoids hard drive seek times at all costs. It turns out that drives and file systems are pretty good at reading a sequential stream, as long as it stays sequential. This is what the hash cache does, and it works well.

So this is where the /hashcache switch comes in. By default, the hash cache uses 256 MB of memory, but you can change that by specifying a new size with /hashcache=(size in MB). E.g. /hashcache=1000. To make the hash cache ~1Gb. This will make a level 4 check faster if you have the RAM. Specifying 0 will turn off the hash cache (not recommended), and specifying a value more than your RAM will be really bad for performance. Specifying a value more than a couple of gigabytes will slow things down too.

Now that I’ve explained what the hash cache is, in the next post I will talk about the new /keephashcache switch and how and when to use it.

WhsDbCheck - 1.0.0 Build 12 BETA

Posted by Alex at 11:19 PM

Notes:

Fixed crash in Level 1, 2 and 3 introduced in last build.

Changes:
  • Fixed crash introduced in last build in Levels 1, 2 and 3 checks.

Download 1.0.0.12 BETA

WhsDbCheck - 1.0.0 Build 11 BETA

Posted by Alex at 8:32 PM

Notes:

Fixed a critical issue that was causing Level 2 checks and above to fail incorrectly on large databases. This was due to the Windows Home Server incorrectly setting the DataSize header to less than the actual data size. We now ignore that header and use the actual file size when reading from a data file. This reduces the effectiveness of a level 2 and a level 3 check somewhat. The level 4 check is not affected. A Level 4 check will still guarantee that every single byte is the same from the day that it was written.

Changes:
  • Changed the index check to use the actual file size instead of the, apparently sometimes incorrect, data size from the header.
  • Fixed GlobalClusterLatest check failure on level 4 with more than 1 pass.
  • Added /keephashcache. Preserves a copy of the hash cache, per index file, for a future run.
  • Hash cache will now allocate only once, preventing out of memory errors.

Download 1.0.0.11 BETA

WhsDbCheck - 1.0.0 Build 10 BETA

Posted by Alex at 9:28 PM

Notes:

Fixed some false positives that were showing up on checking a database with canceled backups. By false positives I mean good databases that were being flagged as bad, not the other way around.

Changes:
  • Fixed BackupSet.Status <> Volume.Status false positive. Just because a backup set has failed does not mean that ALL volume backups part of the set failed. Some could have succeeded.
  • Changed the level 1 Control.NextIndex check to be less aggressive and created a new more comprehensive level 2+ check. The old level 1 check was giving false positives in certain cases.

Download