Other interesting news from the link: They expect improvements in OCR to make it...

moonchild · on July 20, 2024

they are talking about treating ocr as lossy. i wonder about making a lossless compression algorithm for text scans based on an ocr; in effect, use the ocr to predict which text will show up and how, and then encode the pixel-level differences on top of that

eulgro · on July 20, 2024

DjVu does this to some extent, identifying identifical glyph bitmaps and reusing them for compression. See https://en.m.wikipedia.org/wiki/DjVu#Compression

benxh · on July 20, 2024

I am assuming this will be solved this year.