Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Other interesting news from the link: They expect improvements in OCR to make it possible for them in the coming years to apply it to their entire library. This would liberate an enormous amount of knowledge for easier access.


they are talking about treating ocr as lossy. i wonder about making a lossless compression algorithm for text scans based on an ocr; in effect, use the ocr to predict which text will show up and how, and then encode the pixel-level differences on top of that


DjVu does this to some extent, identifying identifical glyph bitmaps and reusing them for compression. See https://en.m.wikipedia.org/wiki/DjVu#Compression


I am assuming this will be solved this year.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: