Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I use OCRmyPDF on a regular basis to OCR journal articles my library sends me.

I've found it works great on English but (with appropriate language packs installed) works poorly on Greek and Hebrew. It also makes no effort to understand the layout of pages (e.g., tables).

The project is fantastic, though. I've often considered building a web frontend that cleans up PDFs and then OCRs them using OCRmyPDF.

For cleaning, check out https://scantailor.org/



I use OCRMYPDF nearly every day. It already has cleaning functions: deskewing, page rotation, despeckling, contrast, etc. The docs linked above show all the many useful options in full.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: