Ubuntu: does btrfs/ZFS make ECC RAM obsolete?



Question:

ECC RAM is recommended to be used in a server in order to minimize/prevent data corruption.

Would ECC RAM be obsolete if your server uses btrfs (or ZFS) as filesystem (which both are said to prevent data corruption)?


Solution:1

I am not sure about btrfs, but ZFS and ECC RAM are like toast and butter... but ZFS without ECC is also like toast with butter.

TL;DR- Using ECC RAM with ZFS is good (because using ECC RAM is in general good), but it is not going to hurt you to not use ECC RAM with ZFS because ZFS is just that good. I would not say that ECC RAM is obsolete though, because there is more going on than just filesystem storage. Errors anywhere are bad. ZFS just helps limit errors in the filesystem. Below is how / why...

Please see http://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/ and read closely. Thank you to Jim Salters for what is below.

I will post the relevant bits in case there is a loss of that site:

First: ECC RAM

What’s ECC RAM? Is it a good idea?

The ECC stands for Error Correcting Checksum. In a nutshell, ECC RAM is a special kind of server-grade memory that can detect and repair some of the most common kinds of in-memory corruption. For more detail on how ECC RAM does this, and which types of errors it can and cannot correct, the rabbit hole’s over here.

Now that we know what ECC RAM is, is it a good idea? Absolutely. In-memory errors, whether due to faults in the hardware or to the impact of cosmic radiation (yes, really) are a thing. They do happen. And if it happens in a particularly strategic place, you will lose data to it. Period. There’s no arguing this.

Second: ZFS

What’s ZFS? Is it a good idea?

ZFS is, among other things, a checksumming filesystem. This means that for every block committed to storage, a strong hash (somewhat misleadingly AKA checksum) for the contents of that block is also written. (The validation hash is written in the pointer to the block itself, which is also checksummed in the pointer leading to itself, and so on and so forth. It’s turtles all the way down. Rabbit hole begins over here for this one.)

Is this a good idea? Absolutely. Combine ZFS checksumming with redundancy or parity, and now you have a self-healing array. If a block is corrupt on disk, the next time it’s read, ZFS will see that it doesn’t match its checksum and will load a redundant copy (in the case of mirror vdevs or multiple copy storage) or rebuild a parity copy (in the case of RAIDZ vdevs), and assuming that copy of the block matches its checksum, will silently feed you the correct copy instead, and log a checksum error against the first block that didn’t pass.

ZFS also supports scrubs, which will become important in the next section. When you tell ZFS to scrub storage, it reads every block that it knows about â€" including redundant copies â€" and checks them versus their checksums. Any failing blocks are automatically overwritten with good blocks, assuming that a good (passing) copy exists, either redundant or as reconstructed from parity. Regular scrubs are a significant part of maintaining a ZFS storage pool against long term corruption.

Situation / Concern

Is ZFS and non-ECC worse than not-ZFS and non-ECC? What about the Scrub of Death?

OK, it’s pretty easy to demonstrate that a flipped bit in RAM means data corruption: if you write that flipped bit back out to disk, congrats, you just wrote bad data. There’s no arguing that. The real issue here isn’t whether ECC is good to have, it’s whether non-ECC is particularly problematic with ZFS. The scenario usually thrown out is the the much-dreaded Scrub Of Death.

TL;DR version of the scenario: ZFS is on a system with non-ECC RAM that has a stuck bit, its user initiates a scrub, and as a result of in-memory corruption good blocks fail checksum tests and are overwritten with corrupt data, thus instantly murdering an entire pool. As far as I can tell, this idea originates with a very prolific user on the FreeNAS forums named Cyberjock, and he lays it out in this thread here. It’s a scary idea â€" what if the very thing that’s supposed to keep your system safe kills it? A scrub gone mad! Nooooooo!

The problem is, the scenario as written doesn’t actually make sense. For one thing, even if you have a particular address in RAM with a stuck bit, you aren’t going to have your entire filesystem run through that address. That’s not how memory management works, and if it were how memory management works, you wouldn’t even have managed to boot the system: it would have crashed and burned horribly when it failed to load the operating system in the first place. So no, you might corrupt a block here and there, but you’re not going to wring the entire filesystem through a shredder block by precious block.

But we’re being cheap here. Say you only corrupt one block in 5,000 this way. That would still be hellacious. So let’s examine the more reasonable idea of corrupting some data due to bad RAM during a scrub. And let’s assume that we have RAM that not only isn’t working 100% properly, but is actively goddamn evil and trying its naive but enthusiastic best to specifically kill your data during a scrub:

First, you read a block. This block is good. It is perfectly good data written to a perfectly good disk with a perfectly matching checksum. But that block is read into evil RAM, and the evil RAM flips some bits. Perhaps those bits are in the data itself, or perhaps those bits are in the checksum. Either way, your perfectly good block now does not appear to match its checksum, and since we’re scrubbing, ZFS will attempt to actually repair the “bad” block on disk. Uh-oh! What now?

Next, you read a copy of the same block â€" this copy might be a redundant copy, or it might be reconstructed from parity, depending on your topology. The redundant copy is easy to visualize â€" you literally stored another copy of the block on another disk. Now, if your evil RAM leaves this block alone, ZFS will see that the second copy matches its checksum, and so it will overwrite the first block with the same data it had originally â€" no data was lost here, just a few wasted disk cycles. OK. But what if your evil RAM flips a bit in the second copy? Since it doesn’t match the checksum either, ZFS doesn’t overwrite anything. It logs an unrecoverable data error for that block, and leaves both copies untouched on disk. No data has been corrupted. A later scrub will attempt to read all copies of that block and validate them just as though the error had never happened, and if this time either copy passes, the error will be cleared and the block will be marked valid again (with any copies that don’t pass validation being overwritten from the one that did).

Well, huh. That doesn’t sound so bad. So what does your evil RAM need to do in order to actually overwrite your good data with corrupt data during a scrub? Well, first it needs to flip some bits during the initial read of every block that it wants to corrupt. Then, on the second read of a copy of the block from parity or redundancy, it needs to not only flip bits, it needs to flip them in such a way that you get a hash collision. In other words, random bit-flipping won’t do â€" you need some bit flipping in the data (with or without some more bit-flipping in the checksum) that adds up to the corrupt data correctly hashing to the value in the checksum. By default, ZFS uses 256-bit SHA validation hashes, which means that a single bit-flip has a 1 in 2^256 chance of giving you a corrupt block which now matches its checksum. To be fair, we’re using evil RAM here, so it’s probably going to do lots of experimenting, and it will try flipping bits in both the data and the checksum itself, and it will do so multiple times for any single block. However, that’s multiple 1 in 2^256 (aka roughly 1 in 10^77) chances, which still makes it vanishingly unlikely to actually happen… and if your RAM is that damn evil, it’s going to kill your data whether you’re using ZFS or not.

Re: Scrubbing

Here enters a valid concern about data loss. Take heed!

But what if I’m not scrubbing?

Well, if you aren’t scrubbing, then your evil RAM will have to wait for you to actually write to the blocks in question before it can corrupt them. Fortunately for it, though, you write to storage pretty much all day long… including to the metadata that organizes the whole kit and kaboodle. First time you update the directory that your files are contained in, BAM! It’s gotcha! If you stop and think about it, in this evil RAM scenario ZFS is incredibly helpful, because your RAM now needs to not only be evil but be bright enough to consistently pull off collision attacks. So if you’re running non-ECC RAM that turns out to be appallingly, Lovecraftianishly evil, ZFS will mitigate the damage, not amplify it.

If you are using ZFS and you aren’t scrubbing, by the way, you’re setting yourself up for long term failure. If you have on-disk corruption, a scrub can fix it only as long as you really do have a redundant or parity copy of the corrupted block which is good. Once you corrupt all copies of a given block, it’s too late to repair it â€" it’s gone. Don’t be afraid of scrubbing. (Well, maybe be a little wary of the performance impact of scrubbing during high demand times. But don’t be worried about scrubbing killing your data.)

Once again, thanks to Jim Salters for publishing this.


Solution:2

ECC is cyclic coding like compact checksum that also can correct value.

See https://en.wikipedia.org/wiki/Hamming_code#Hamming_codes_with_additional_parity_(SECDED) and https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction for more details.

Lets say it adds three additional bits to byte and allow to detect and fix up to two error bits for example.

As you can see now, ECC application to FS and RAM are independent.

Also RAID6 allow you to use ECC with any FS.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »