> "I've experienced 100 error-free hours, so this is a non-issue for me and my users"
It's a statement of fact: it has been a non-issue for me. If you're like me, it's statistically reasonable to assume it will be a non-issue for you too. Also, no users, just me. "Proabably okay" is more than good enough for me, and I'm sure many people have similar requirements (clearly not you).
I have no optimism, just no empathy for the negligent: I learned my lesson with backups a long time ago. Some people blame the filesystem instead of their backup practices when their data is corrupted, but I think that's naive. The filesystem did you a favor, fix your shit. Next time it will be your NAS power supply frying your storage.
It's also a double edged sword: the more reliable a filesystem is, the longer users can get away without backups before being bitten, and the greater their ultimate loss will be.
> No! This simply does not follow from the first statement, statistically or otherwise.
> You and I might or might not be fine; you having been fine for 100 hours on the same configuration just offers next-to-zero predictive power for that.
You're missing the forest for the trees here.
It is predictive ON AVERAGE. I don't care about the worst case like you do: I only care about the expected case. If I died when my filesystem got corrupted... I would hope it's obvious I wouldn't approach it this way.
Adding to this: my laptop has this btrfs bug right now. I'm not going to do anything about it, because it's not worth 20 minutes of my time to rebuild my kernel for a bug that is unlikely to bite before I get the fix in 6.9-rc1, and would only cost me 30 minutes of time in the worst case if it did.
I'll update if it bites me. I've bet on much worse poker hands :)
Well, from your data (100 error-free hours, sample size 1) alone, we can only conclude this: “The bug probably happens less frequently than every few hours”.
Is that reliable enough for you? Great! Is that “very rare”? Absolutely not for almost any type of user/scenario I can imagine.
If you’re making any statistical arguments beyond that data, or are implying more data than that, please provide either, otherwise this will lead nowhere.
I don't care about the aggregate: I only care about me and my machine here.
> The expected case after surviving a hundred hours is that you're likely to survive another hundred.
That's exactly right. I don't expect to accrue another hundred hours before the new release, so I'll likely be fine.
> Which is a completely useless promise.
Statistics is never a promise: that's a really naive concept.
> at reasonable time scales for an OS
The timescale of the OS install is irrelevant: all that matters is the time between when the bug is introduced and when it is fixed. In this case, about nine months.
> Even so, "likely" here is something like "better than 50:50". Your claim was "very very rare" and that's not supported by the evidence.
You're free to disagree, obviously, but I think it's accurate to describe a race condition that doesn't happen in 100 hours on a multiple machines with clock rates north of 3GHz as "very very rare". That particular code containing the bug has probably executed tens of millions of times on my little pile of machines alone.
> It's a promise of odds with error bars, don't be so nitpicky.
No, it's not. I'm not being nitpicky, the word "promise" is entirely inapplicable to statistics.
It's a statement of fact: it has been a non-issue for me. If you're like me, it's statistically reasonable to assume it will be a non-issue for you too. Also, no users, just me. "Proabably okay" is more than good enough for me, and I'm sure many people have similar requirements (clearly not you).
I have no optimism, just no empathy for the negligent: I learned my lesson with backups a long time ago. Some people blame the filesystem instead of their backup practices when their data is corrupted, but I think that's naive. The filesystem did you a favor, fix your shit. Next time it will be your NAS power supply frying your storage.
It's also a double edged sword: the more reliable a filesystem is, the longer users can get away without backups before being bitten, and the greater their ultimate loss will be.