Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is an important difference in failure modes between HDDs and SSDs for consumers though. An SSD that fails is likely to go into a read-only mode that still allows you to recover your data. The only way to recover your data from a HDD failure is to find a data recovery business.

However for a business like blackblaze that specializes in data storage, recovering data from an HDD is probably more reliable. After all, in the case of a complete failure recovering data from the platters inside the HDD is more likely to succeed than recovering data from the memory chips inside the SSD.



Just an anecdote, but 10 years following about 200-300 laptops, with about 90% of them using SSDs, I only had 2 drive failures, both with SSDs: one of them became completely unusable (all data lost and impossible to recover anything, even after paying a specialized company to retrieve the data), and the other cost 2k$ to get the data back - and the company struggled with it for a few weeks, after which the data became less useful.

Apparently the company was better equipped to handle HDDs, and would have probably gotten the data easily if that were the case. But the experience made me very dubious of SSD reliability.

In both cases, users were using their laptops just fine, and out of the blue, the laptops froze or had some erratic behavior, and the next time they booted, the drives were already unusable.

On the other hand, when I had an HDD failure (due to electrical power fluctuation), except for some bad sectors where the needle hit the disk, it remained usable for years.

So I agree with you, their failure modes are important. Of course, nowadays most laptops only have M.2 slots anyway, so HDDs are completely out of the question.


> cost 2k$ to get the data back

Something about regular backups of valuable data should be said here.


Hindsight is 20/20. They're probably doing better with backups now, and they certainly understood how beneficial backups are, when they were dealing with the data loss.


I've had the opposite experience. An SSD died without any warning resulting in complete data loss, while a dying HDD started making funny noises, which prompted the thought "Hmm, might be time for a backup again".


I've had two types of SSD failures, one with super cheap drives that I knew would fail early and gave lots of warnings in the form of unpredictability before failure, and one of the more expensive SanDisk SSDs that failed without warning catastrophically one month after the 3-year warranty (which wouldn't have covered data, anyway).

With the cheap ones I knew to be paranoid, with the expensive one I lost a few things.

Every single HDD failure I've experienced has had so many warnings long before total failure that I do feel more comfortable with them, especially for unattended backup utilities where I don't notice how slow they are. I've never tried, but have wondered if platters can be transferred or PCBs replaced at least to transfer out the data.


Replacement PCBs is a method of data recovery used for HDDs. There's some videos of it on YouTube.


That's the promise. In practice your SSD might kill itself due to a bug in the firmware and drop off the bus without warning long before the flash is worn out.


I can't even remember reading any reports of read only failure due to flash exhaustion. I do remember boat loads of early SSDs, mostly sandforce based ones, spontaneously dying due to firmware bugs.


Electronic parts can fail silently and without warning in either case, e.g. due to a tin whisker producing a short somewhere.

Mechanical parts being less reliable than electronic parts in HDDs, and their relatively slow degradation, allow for an earlier warning when data can still be recovered but the drive should rather be replaced.


HDDs also have controller firmware and it's not unseen that they kill hard drives early.


Backblaze can lose 3/20 hard drives and still recover the data from parity drives. They don't have to rely on recovering it from non-operative drives unless they become unbelievably unlucky.


> An SSD that fails is likely to go into a read-only mode that still allows you to recover your data. The only way to recover your data from a HDD failure is to find a data recovery business.

SSD failure in my experience both professional and private is definitely what you write there. SSD failure just make the system no longer recognize the drive. No read only mode to speak of

where as a failing hard drive has a bit more survival luck, also the data recovery at a very high success rate when going to a third party


SSDs might go to read only mode when they have block exhaustion, but my experience in practice is that they simply stop functioning entirely when they fail, whether the endurance is still at 100%, 80% or 5%. You could put the NAND on a different controller, but that's a very involved, difficult process, and even that isn't guaranteed to do anything.

Their unreliability has made me far more committed to ensuring that anything of value is backed up incrementally 24/7.


Sounds irrelevant in both cases. A business will use real redundancy with extra drives rather than hoping to maybe recover data from a dead drive. If the drive dies they probably just pull it and rebuild, which is likely quite a lot better (from a technical perspective, excluding storage space) on an SSD (much higher write rates restore the redundancy faster).

Electronic failures are technically possible, but tbh I think that's so unlikely nowadays it's not really worth considering. The amount of electronics in everyday life and almost none of them ever break for reasons related to the elctronics themselves.


Electronics don't fail because it's all so brand new. And this is due to programmed obsolescence or whatever you call it where software will force you to throw it while the hardware is still perfectly functional.


Sure, stuff goes out of date, but rarely does it fail for hardware reasons. Practically the only electronic components that fail with any regularity are chemical components like electrolytic caps and batteries.


> An SSD that fails is likely to go into a read-only mode that still allows you to recover your data.

Anecdotally, every (n=5) SSD that has failed on me suffered sudden* catastrophic failure that made the drive unreadable.

I forget the exact models, but one was a high end consumer Samsung NVMe, two higher end Samsung SATA SSDs, and a couple of M.2 SATA SSDs in laptops.

* One of the M.2 SATA SSDs was a slower failure where the kernel reported I/O errors at first, but it could still boot a few times, then it completely failed mid-backup and never mounted again.


> An SSD that fails is likely to go into a read-only mode that still allows you to recover your data.

My anecdotal experience of 2 ssd drive failure was they were unreadable. If I remember correctly in both case the drive wasn't even listed by the operating system tools as if they were unplugged (can't remember about the bios/UEFI firmware).

Bottom line: do backups and test them regularly.


This lines up with mine as well. My HDDs fail more often than SSDs, but when they do it’s a painful slow death that shows itself in SMART and affords me days if not weeks. Out of 5 SSDs I had only one fail, and that one worked perfectly well a moment, next second my OS froze and it wouldn’t show up in BIOS anymore.


Yep, all my SSDs that have failed so far have gone completely undetectable to all the computers I tried to use them on. I have never seen this read-only failure mode personally yet. It feels made-up to me.


backblaze(or any large storage provider) is never going to recover data from a failed storage drive. Their whole value proposition is that they are designed so they don't need to. In fact if I remember correctly, backblaze does not even try to replace failed drives, they wait until a rack unit ~ 60 drives degrades to given amount then replace the whole unit.

Fun fact, one of the surprising differences between consumer grade drives and enterprise grade drives is the firmware in the enterprise drives is designed to fail fast. The theory being a consumer drive is probably the sole unit and should probably give every effort to limp along, to give the user a chance to get the data off. While an enterprise drive is probably part of a redundant array and should die quickly so the rebuild process can start as soon as possible.


Wouldn't that failure mode be most equivalent to something like a bad blocks failure on a hard drive?

Both can have random ~controller electronics failures, and I think the SSD is often designed to be impossible to recover in that situation with a replacement controller or other electronics.


You are entirely correct, however most consumers would take the loss and discard the data rather than replace a drive controller or salvage the platters. That's something they would need to hire an expensive data recovery service for.

If you're a business or a professional then certainly HDDs provide you with more recovery opportunities.


It's certainly interesting how failure modes play out for consumers given that they more often run drives to some failure..

While I wouldn't buy a HDD as a consumer, I think they are all around better from this angle. Many consumers actually did pay to have a thesis retrieved from HDDs when laptops were still using them and the partial media failure offering a data retrieval is basically the same bad option often presented on either.

(The article is out of date and I would suspect SSDs have improved at a faster rate than HDDs since, but some of that improvement will have been redirected to make even cheaper consumer options.)


The following quote is from a post by CrossVR but as my reply is both long and in parts deviates into a more general discussion I've avoided cluttering up that thread by posting it as a new comment.

"An SSD that fails is likely to go into a read-only mode that still allows you to recover your data."

Perhaps so, but from my experience that's often not what happens, most of my SSD failures have resulted in the drives being stone-dead, that is completely unresponsive and their data is irrecoverable. I'd like to see some comprehensive statistics on this.

There's much that can be said about the reliability of hard drives and SSDs (and other electronic storage media) as well as the role manufacturers have in making reliable storage. I'd argue that their collective actions aren't helpful and overall they have made present-day storage less reliable than it ought to be by virtue of the secretive and proprietary processes they've employed to manufacture their products.

First, it seems to me that all too often we do not bother to consider storage technologies in an holistic way; after all, the management, control and long-term integrity of our data ought to be our first and principal focus, so from the outset we ought analyse how effective current storage technologies are at achieving these goals. My contention is that when we appraise modern electronic storage with that objective in mind then we find that it falls very short of the ideal. If it were not for the fact that at present there is no existing technology that is more reliable and more suited for purpose then I'd suggest that all currently-used technologies are not fit for purpose.

Of course, one cannot justifiably make such a claim if better solutions do not exist, so let me explain why I've ventured forth with such a provocative comment. Before I do I'd add that any detailed analysis of the subject is worthy of a lengthy book, thus—with limited exceptions—involved technical discussion is essentially outside the scope of this post. Therefore, rather than spend time on the minutae of drive failures I'll instead focus on users' data and the essential requirement to protect its integrity for as long as is necessary. Specifically, that must be for as long as users deem said data useful and or relevant. In practice, that could be upwards of many decades, and in some instances data may have to be kept in perpetuity.

That brings me to major concerns I have with existing data storage systems, first is the technologies that both hard disks and SSDs employ are essentially ephemeral in that they are intrinsically fragile. The consequences are that the level of overhead necessary to protect and maintain data integrity over its required lifetime is high; second, in the neverending quest to increase data densities, the developedment of both hard drives and SSDs is continuously being pushed to the limit. Combined, they not only contribute to overall system fragility but also to reduced lifecycles, they increase turnover of hardware and increase maintenance costs.

Moreover, even if hardware does not fail within its nominal replacement schedule, its lifecycle is still of extremely short duration when compared with traditional storage. Again, ongoing upgrades and hardware replacements put a considerable burden on both professional and individual users. Moreover, maintenance and its concomitant costs must be continued throughout the expected lifetime of the stored data, otherwise data integrity would suffer.

Nevertheless, for the most part, professional users—data centers, well-managed server sites, etc.—end up better off than individual users in that (a) they have strict and well-structured backup procedures that ensure the integrity and longevity of users' data, and (b) their infrastructure is such that it's easier for them to follow best practice

On the other hand, users who manage their own data have to take full responsibility for both its integrity and longevity which is a considerable challenge given that many are unaware of the limitations of the technologies they are forced to use. Thus, it's not surprising this group often experiences difficulties in maintaining the integrity of their data across its deemed lifetime.

Such are the limitations of modern electronic storage systems. In the grand scheme of things, whilst present-day technologies offer many advantages over older more traditional storage systems like print and paper documents they nevertheless exhibit some serious drawbacks and disadvantages. Irrespective or whether they're used to store analog or digital data, and unlike their older counterparts, modern storage and retrieval systems cannot offer users set-and-forget long-term data storage that will remain reliable and viable over many decades or even centuries because technologies necessary to support this level of reliability simply do not yet exist in any practical form. (Yes, I accept some low-density long-term storage systems do exist but for almost all modern applications they're impractical to use.)

As mentioned, all common 'electronic' storage technologies in current use are, by nature, essentially ephemeral. Restated, the information stored in modern data storage and retrieval systems has to be refreshed and regenerated at regular intervals to remain viable. Moreover, in comparison with traditional information storage, modern electronic systems require the interval between data refresh cycles to be very short indeed. For instance, there are many paper-based documents that are many centuries old and the information they contain is still fully viable whereas with digital systems refresh intervals can be as short as the life of a hard disk or SSD or even less—that is as short as several years, five or so at most.

Holistic considerations require one to again draw a comparison between the longevity of traditional information and the ephemeral nature modern storage technologies, so despite my above point about keeping technical details out of the discussion, I have to make brief mention of the limitations of these technologies to justify my assertions.

Going on evidence it's safe to say there's no 100% guarantee (in fact it's unlikely) that data stored in any electronic storage media that's in common usage nowadays will be able to be read in say 20—30 years from now let alone in 50 or 100 years. Simply, we have no electronic technology nor any foolproof system that can guarantee that data can be stored for many decades and still be read faithfully. That this story and these posts are concerned with the short and inadequate service life of hard disks and SSDs only highlights the point.

Why is this so? Well, let's briefly look at various storage technologies currently in use and consider why none is even satisfactory let alone ideal. First, consider cloud storage. To commit one's data to the Cloud means that one must have complete faith in the vendors that offer such services. A quick examination shows that none have even reasonable form when it comes to how they treat customers, ipso facto, same goes for their data. Expecting entities such as Google, AWS and Microsoft to still be around and remain in their current corporate form in 20, 30 or so years let alone 50 or more is just fanciful. Cloud storage thus requires users to be vigilant and to constantly monitor how their data is being managed on a regular basis. For some this won't be possible.

This begs the question about the long-term management of say important historical data and such that doesn't have a specific owner or custodian to look after it on a regular basis. As sure as eggs, even if these entities last the distance—which is doubtful—they're very unlikely going to protect this 'orphaned' data as a mother would protect a child. One only has to look at the ruthless tactics that Big Tech has already adopted to see that one's data isn't always secure. Then there's DRM, there's not even any short-term guarantee that data one's actually paid for won't be summarily deleted at a whim.

Now consider the tech itself. Hard disks have a magnetic remanence problem where magnetic intensity drops by a significant percentage per year, thus data stored on them has a definite lifespan. Even if drives are properly archived and only used for data retrieval data rot will eventually claim its data. As stated, to avoid this data must be refreshed periodically, even then there are no guarantees that this will occur in a timely manner across the life of the data.

Similarly with SSDs. The way SSDs store data, is at best, can only be described as unreliable and fraught with risk. In my opinion, storing electric charges in 'glass'-like medium/substrates and expecting them to remain there indefinitely is like believing in fairyland. As it is, data on SSDs has to be refreshed periodically to ensure it doesn't altogether disappear. By nature, electronic data is already ephemeral, so storing it on SSDs then forgetting about it has to be the riskiest of risky procedures. It's almost equivalent to erasing one's data.

In summary, with current technologies, unless great care is taken to manage users' data and to ensure it's refreshed at proper intervals then it will definitely atrophy over time. This is the perennial problem with the current tech.

Is there rhe possibility that we'll get almost bullet-proof storage technologies? Yes, there definitely is and there are some good contenders, but don't hold your breath, there's no indication that they'll be around anytime soon. That said, I'm not even going to mention them here as that'd likely double the length of this post.


The problem is no proper theory of backups exists: https://news.ycombinator.com/item?id=41173227


Unfortunately, I also have to agree with that fact.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: