Compression artifacts in X-rays can easily kill people, either by requiring addi...

simondotau · on June 5, 2021

And this becomes increasingly true as compression methods get increasingly clever.

https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea...

layoutIfNeeded · on June 5, 2021

Ah yes, the JBIG2 fiasco!

JBIG2 is a format for storing black and white documents in a highly compressed way. It works by detecting each letter in the document, and then replacing it with a pointer to the reference version of that letter, up to a certain threshold. Basically compression via OCR.

Of course, this means that when a distorted letter is too close to the reference version of another letter, it will get replaced with a clean version of that incorrect one. So even though a human could easily recognize that something was off with that letter in the original image, the JBIG2-compressed image has no such clue!

What’s really bad is that JBIG2 compression was built into certain Xerox machines that were used by archivists to digitize important documents for years until someone noticed the discrepancies. JBIG2 was promptly banned for archival purposes, but there might still be a ton of documents with these kind of invisible errors in our archives! :-)

nextaccountic · on June 5, 2021

It would be so cool to add the OCR as metadata. Texts in internet images could be readily selected and available to assistive technologies if images were OCRd at creation time.

layoutIfNeeded · on June 5, 2021

PDF supports this use case by adding an invisible text layer on top of the raster content.

On the other hand, JBIG2 doesn’t actually do OCR. It only does template matching of similar-looking blocks of pixels. The compressor doesn’t try to understand which letter those pixels represent.

publicola1990 · on June 5, 2021

But isnt medical images interpreted by eye only, so artifacts of compression not visible to the eye, those should not be a problem possibly?

Synaesthesia · on June 5, 2021

Artifacts can be visible. Also they can be destructive.

skywal_l · on June 5, 2021

A lot of algorithms are applied to medical images, as pre-processing for eye examination but also for automated analysis.