Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the failure mode that is happening for users/devs here is bit rot. It's not that the device won't report back the same bytes, even if you disable whatever caching is happening, it's that after T amount of time it will report the wrong bytes. Some file systems have "scrubs" and stuff they do to automatically find these and sometimes attempt to repair them (ZFS can do this).


I'm the author of HashBackup. IMO, silent bitrot is not really a thing. I say this because every disk sector written has an extensive ECC recorded with it, so the idea that a bit can flip in a sector and you get bad data without an I/O error seems extremely unlikely. Yes, you could have buggy OS disk drivers, drive controllers, or user-level programs that ignore disk errors. And yes, you could have a bit flip on magnetic media causing an I/O error because the data doesn't match the ECC.

I believe that that using non-ECC RAM is a potential cause of silent disk errors. If you read a sector without error, then a cosmic ray flips a bit in RAM containing that sector, you now have a bad copy of the sector with no error indication. Even if the backup software does a hash of the bad data and records it with the data, it's too late: the hash is of bad data. If you are lucky and the hash is created before the RAM bit flip, at least the hash won't match the bad data, so if you try to restore the file, you'll get an error at restore time. It's impossible to recover the correct data, but at least you'll know that.

The good news is that if you backup the bad data again, it will be read correctly, and be different from the previous backup. The bad news is, most backup software skips files based on metadata such as ctime and mtime, so until the file changes, it won't be re-saved.

We are so dependent on computers these days, it's a real shame that all computers don't come standard with ECC RAM. The real reason for that is that server menufacturers want to charge higher prices to data centers for "real" servers with ECC.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: