I use OCRmyPDF on a regular basis to OCR journal articles my library sends me. I... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		jcuenod on July 9, 2022 \| parent \| context \| favorite \| on: OCRmyPDF: Add an OCR text layer to scanned PDF fil... I use OCRmyPDF on a regular basis to OCR journal articles my library sends me. I've found it works great on English but (with appropriate language packs installed) works poorly on Greek and Hebrew. It also makes no effort to understand the layout of pages (e.g., tables). The project is fantastic, though. I've often considered building a web frontend that cleans up PDFs and then OCRs them using OCRmyPDF. For cleaning, check out https://scantailor.org/

gumboshoes on July 9, 2022 [–]

I use OCRMYPDF nearly every day. It already has cleaning functions: deskewing, page rotation, despeckling, contrast, etc. The docs linked above show all the many useful options in full.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact