Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Compression artifacts in X-rays can easily kill people, either by requiring additional unnecessary X-rays (which both delay diagnosis and cause cancer) or by causing erroneous diagnoses; by comparison, the cost of the data storage thus saved is trivial. Compression artifacts in filtered photos of your cute pet turtle for Instagram are much less likely to kill people.


And this becomes increasingly true as compression methods get increasingly clever.

https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea...


Ah yes, the JBIG2 fiasco!

JBIG2 is a format for storing black and white documents in a highly compressed way. It works by detecting each letter in the document, and then replacing it with a pointer to the reference version of that letter, up to a certain threshold. Basically compression via OCR.

Of course, this means that when a distorted letter is too close to the reference version of another letter, it will get replaced with a clean version of that incorrect one. So even though a human could easily recognize that something was off with that letter in the original image, the JBIG2-compressed image has no such clue!

What’s really bad is that JBIG2 compression was built into certain Xerox machines that were used by archivists to digitize important documents for years until someone noticed the discrepancies. JBIG2 was promptly banned for archival purposes, but there might still be a ton of documents with these kind of invisible errors in our archives! :-)


It would be so cool to add the OCR as metadata. Texts in internet images could be readily selected and available to assistive technologies if images were OCRd at creation time.


PDF supports this use case by adding an invisible text layer on top of the raster content.

On the other hand, JBIG2 doesn’t actually do OCR. It only does template matching of similar-looking blocks of pixels. The compressor doesn’t try to understand which letter those pixels represent.


But isnt medical images interpreted by eye only, so artifacts of compression not visible to the eye, those should not be a problem possibly?


Artifacts can be visible. Also they can be destructive.


A lot of algorithms are applied to medical images, as pre-processing for eye examination but also for automated analysis.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: