What differentiates this from other tools? Eg tesseract, EasyOcr?

jasonhemann · 2025-02-14T03:31:12 1739503872

I was going to say, isn't tesseract /already/ OCR for everyone?

rkagerer · 2025-02-14T04:30:10 1739507410

I played with Tesseract a long time ago. Has the accuracy and success rate improved a lot in the last eg. decade?

MassPikeMike · 2025-02-14T04:58:30 1739509110

I maintain a searchable archive of historical documents for a nonprofit, OCR'd with Tesseract over several years. Tesseract 4 was a big improvement over previous versions, but since then its accuracy has not improved at the same rate as other free solutions.

These days, just uploading a PDF of scanned documents (typeset ones, not handwriting) to Google Drive and opening with Google Docs results in a text document generated with impressive quality OCR.

But this is not scriptable, and doesn't provide access position information, which is needed so we can highlight search results as color overlays on the original PDF. Tesseract's hOCR mode was great for that.

For the next version, we're planning to use one of the command-line wrappers to Apple's Vision framework, which is included free in MacOS. A nice one that provides position information is at https://github.com/bytefer/macos-vision-ocr

thenthenthen · 2025-02-14T06:13:50 1739513630

I asked this question yesterday but did not enough votes. I need to OCR and then translate thousands of pages from historical documents and was wondering if you knew a scriptable app/technique or technology that includes ‘layout recovery’, aka overlaying translated text over the original, like the Safari browser etc. does (not sure the apple vision framework wrapper does this?).

MassPikeMike · 2025-02-14T16:56:49 1739552209

Apple Vision and its wrappers provide bounding boxes for each line of text. That's slightly less convenient than Tesseract which can give you a bounding box for each word, but more than compensated by Apple Vision's better accuracy. I am planning to fudge the word boxes by assuming fixed-width letters and dividing up the overall width so that each word's width is proportional to its share of the total letters on the line.

Once you have those bounding boxes, it's pretty simple to use a library like [1] (Python) or [2] (JavaScript) to add overlay text in the right place. For example, see how [3] does it.

[1] https://pymupdf.readthedocs.io/en/latest/recipes-text.html#h... [2] https://github.com/foliojs/pdfkit [3] https://github.com/eloops/hocr2pdf

thenthenthen · 2025-02-15T10:23:39 1739615019

Thank you, I will take a look at hocr2pdf to see how they overlay text!

wahnfrieden · 2025-02-14T08:32:06 1739521926

FYI the Apple one is best inside the Live Text API which is Swift-only and so some old Python and CLI tools which wrap the older Obj-C APIs may have worse quality (though Live Text doesn't really provide bounding boxes - so what I do is combine its output with bounding box APIs like the old iOS/macOS ones)

llm_trw · 2025-02-14T05:06:05 1739509565

Tesseract has had a near 100% success rate since the first time I used it in 2008 _when you read the manual_.

Black letters on white background, xheight of between 10 to 30 px, tiff format, mono column layout, etc., etc., etc..

People get terrible results because they treat it like a phone app and drop a barely legible colored jpg of a bent page and wonder why it's garbage.

tjoff · 2025-02-14T06:02:01 1739512921

Reading the manual to actually tune it or reading the manual to know the limitations?

All I've ever tried it for is pixel perfect super high contrast text but still the results doesn't exactly impress.

rcxdude · 2025-02-14T13:03:07 1739538187

Sure, if you're generating the input to be machine readable, then it's not very surprising that it's machine readable withough much effort. But then you could also use a QR code. Most people who want to OCR stuff are doing it because they don't have control over the input.

poulpy123 · 2025-02-14T10:18:48 1739528328

what you are saying is that tesseract works perfectly if you don't need to use it with real world stuff

9rx · 2025-02-14T13:34:05 1739540045

My read is that he is saying that Tesseract is intended for OCR, not an entire image pipeline, so there is an expectation that you will preprocess those real world images into a certain form rather than throwing an image straight off the sensor at it.

criddell · 2025-02-14T13:01:41 1739538101

They wonder why it's garbage because modern apps have set people's expectations much higher.

A barely legible colored jpg of a bent page shouldn't be a problem anymore.

llm_trw · 2025-02-14T13:14:04 1739538844

It works better than those apps when you know what you are doing.

Not everything needs to be made for a chimp.

krick · 2025-02-14T05:10:01 1739509801

Yes' anecdotally, it's a bit better now. Still nowhere near actually usable OCR software though, unless your use-case is scanning clear hi-res screenshots in conventional fonts and popular langues, without tables or complicated formatting.

jackbravo · 2025-02-14T05:18:25 1739510305

or https://ds4sd.github.io/docling/ from IBM.

pininja · 2025-02-14T04:14:38 1739506478

This tool says it includes a workflow GUI and refinement tools, like creating work-specific text recognition models - maybe the others do too? tesseract isn’t packaged with a GUI, but is wrapped by many.

This project seems focused on making tools more accessible and helping the user be more efficient and organized