ZH version is available. Content is displayed in original English for accuracy.
Advertisement
Advertisement
⚡ Community Insights
Discussion Sentiment
54% Positive
Analyzed from 1254 words in the discussion.
Trending Topics
#data#drive#corrupted#correct#files#zfs#years#detect#used#disk

Discussion (19 Comments)Read Original on HackerNews
I always thought that the reason zfs did its extensive CRC checks was primarily to detect data corruption while it was in RAM or over the network, with a side effect that in the rare cares that data on disk got corrupted without the drive detecting it because the CRC was still valid, it'd also be spotted.
But anyway, it might be worth testing by replacing some of the disk images with actually truncated ones so that there are holes when reading, so that it returns an actual read error rather than junk data.
I have seen this a few times on HDDs that had been used for the cold storage of archival data, for several years (around 5 years or even more). For each archive file, I had my own hash values that were used to detect corrupted files, which allowed me to detect all such cases. I had duplicates for all such HDDs. Sometimes both HDD copies had a few silent corrupted sectors, but they were not in the same locations, so in all cases I could recover the corrupted files from their duplicates. If I had stored the archival data without redundancy, I would have lost it.
If you do not use hashes or other error-detecting codes for all your files, like I do, you may have had some failures in your HDDs without recognizing them, but such errors are much more likely to happen in files that have been stored for many years.
https://en.wikipedia.org/wiki/Parchive
However, I use both par2create and duplicate storage media, because duplicates that are preferably stored in different geographic locations are the only solution that guards against incidents so serious that they would destroy partially or totally the storage device.
By itself, when an adequate amount of added redundancy is chosen, par2create is sufficient to recover archive files that are only affected by a few sporadic corrupted sectors, like on a HDD that has been stored in good conditions for some years. It will not help if the entire HDD becomes unusable, due to some mechanical or electrical defect, which may happen in HDDs used for cold storage, instead of being used continuously.
Nope, it's always been about on-disk bit rot.
First off: drive firmware has been known to return the wrong LBA data. The OS asks for 123, the drive reads 234—and verifies its drive-level CRC, which passes—and sends it up. Application gets a bundle of bits that's not correct. With ZFS, it expects a certain checksum from that part of the tree/file, and so the LBA 234 gets returned it will not match the checksum that is for 123.
Next, if you have RAID-1, then if the drive has corrupted data, if you don't have higher-level FS checksums, how do you which mirror has the correct data? They're different, but which is correct. With ZFS you know which block has the correct checksum, return that data to application, and then use the correct data to correct the wrong one.
ZFS also lets you specify number of copies on a single disk. This sounds a bit weird, but as drives suffer block failures far more often than total failures, it's actually surprisingly useful in some situations.
[1] My suspicion is significantly, as storage sizes are now multiple orders of magnitude larger and errors per MB can't have scaled up linearly to match.
That's the behavior that is desired, yes. And in a neat world of frictionless pulleys and ropes that don't stretch, perhaps that is what happens.
In reality, the root reasoning for filesystems to detect bitrot is simpler: It's irrational to expect that a device which is already failing is going to behave in a predictable way.
There's been times when some features of entire models of drives have been disabled in the Linux kernel because of buggy firmware that silently writes bad data (with correct ECC), so reading it back is successful from both the drive's and the OS's block driver views.
I was hit by this myself with the queued TRIM command firmware bug that affected all Samsung EVO 840 SSDs (Linux kernel commit 9a9324d3969678d44b330e1230ad2c8ae67acf81 if you want to look into the history) - the drive didn't report any errors, but ZFS kept reporting corruption, and kept on fixing it in the background.
Even then, I had multiple cases where files were corrupted, and once the whole array refused to be online due to corrupted metadata. I had to make ZFS to replay the journal log with undocumented commands. Sometimes it takes a few days of hair-rising recovery but I always manage to get the array back intact.
The files that are corrupted are always extremely large files (>50 GB) with many small read/writes (eg. iSCSI image files.)
It's pretty impressive how resilient ZFS is, really, given I had what likely to be the worst possible hardware combination.
https://gist.github.com/chapmanjacobd/bc6e31c8bc3647e0bcb0c4...
pretty fun!
God the intensity is tiresome. Whether or not it's AI slop, it's also bad writing. Things can be fun or interesting or worthwhile without being a harrowing battle of discovery!
The quoted sentences used "correct", "right" and "wrong". Hardly the sensationalist words you're implying.