Hacker News new | past | comments | ask | show | jobs | submit login

What I found most interesting about the review is that Apple chose not to implement file data checksumming for the reason that the underlying hardware is very 'safe' and already employs ECC anyway.



Which is silly, and fails to isolate problems when/where they happen. Pretty much every significant level should have its own checksum, in many cases hopefully an ECC of some form. Hardware has bugs/failures as does software. What is particularly evil is when they both collide in a manner which causes silent/undetected corruption of important data for long periods of time.


That's not the only reason, though. There's other factors going into that decision that make it totally rational:

APFS isn't designed as a server file system. It's meant for laptops, desktop and most importantly (to Apple) mobile devices. Note that most of the devices are battery powered. That means "redundant" error checking by the FS is a meaningful waste.

That's not to say they might not add error checking capability in the future, but it makes total sense to prioritize other things when this file system is mostly going to be used on battery powered clients basically never on servers.


So the idea is that the cloud handle the reliable, APFS is average end-user experience optimized ?


Similar design decision as IPv6, which doesn't have CRC at the IP level, unlike IPv4, for the same reason.


If I remember correctly it's actually the opposite and it was reasoned that checksumming should be done higher up.


Actually the reason for it is that lower layers already do checksuming and generally at that layer you don't get scrambled packets. You only lose packets which happens when there's congestion.


But anyone who's ever looked knows that you DO get scrambled packets, thanks to buggy hardware. And disks screw up, too.


Buggy Hardware you would catch with your MAC checksum, such as ethernet's CRC.


Anyone with enough hardware can observe this sort of problem:

http://www.evanjones.ca/tcp-and-ethernet-checksums-fail.html

When The CRC and TCP Checksum Disagree: http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigco...

Alternately, just look at "netstat -s" for any machine on the Internet talking to a bunch of others. Here's the score for the main web host of the Internet Archive Wayback Machine:

    3088864840 segments received
    2401058 bad segments received.


I'd prefer them in the software so that they work with external storage like USB hard discs.


Another way to look at it is checksumming is something that should be done at a generic lower level block layer, not the filesystem layer.

Do it in CoreStorage, not filesystem X.


One of the key innovations in ZFS is storing checksums in block pointers which is something that cannot be done efficiently outside the file system. Storing checksums elsewhere is far more complex and expensive.


Serious question: what value does filesystem checksumming offer for the average user who has no redundancy?

I mean, it'll tell you that your only copy of a file got corrupted, but it'll still be corrupted...


It tells you that your file is corrupted. You can then restore from backups, re-download, or take some other corrective action, such as delete the file, reboot the machine, re-install the operating system, or play Quake 2 to test your RAM and graphics.

Never underestimate the value of a reason to play Quake 2.


I remember when I played DooM to test and benchmark a computer...


The average user might have no redundancy, but they still ought to have a backup. Checksum failure tells them they need to restore.

At the very least, a checksum failure might tell them (or the tech they're consulting) that they have a data problem, rather than, say, an application compatibility problem.


"Why is my machine crashing?" "Well, somelib.so is reporting checksum failures" is a much better experience then "weird, this machine used to be great but now it crashes all the time"


Most all executable files are already codesigned, which is a better version of checksums, so it'd only help user data files.


somelib.so what? And what's a "checksum"? Error messages need to be comprehensible to the average user.


"Error: Buy a new Mac."


Assuming your intent is not to troll: "The file xyz.txt is corrupt. Click here to restore from a Time Machine backup."


Today you can verify backups on OS X with "tmutil verifychecksums", at least on 10.11. The UI to this could be improved, but user data checksums don't necessarily need to be a filesystem feature. On a single-disk device, the FS doesn't have enough information to do anything useful about corrupt files anyway.


> On a single-disk device, the FS doesn't have enough information to do anything useful about corrupt files anyway.

Some filesystems can be configured to keep two or more copies of certain filesystem/directory/etc. contents. Two copies is enough information to do something useful.


Well, Apple is moving in the direction of syncing everything with iCloud - iCloud Drive has been around for a while, and Sierra adds the ability to sync the desktop and Documents folder; of course on top of long-existing things like photo sync. If the file was previously uploaded to iCloud, there is redundancy, and you definitely don't want to overwrite it with the corrupted version.

How big an issue this is in practice I don't know.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: