Full disclaimer: I work at Nanonets Excited to share Nanonets-OCR-s, a powerful ...

mvac · 2025-06-16T10:44:45 1750070685

Correct link for Docext: https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m...

RicoElectrico · 2025-06-16T19:11:01 1750101061

Could be it used to (maybe with help of a downstream LLM) parse a photo/PDF of a restaurant menu into a JSON file conforming to a schema? Or would bigger, hosted multimodal LLMs work better in such case?

arkh · 2025-06-17T09:17:37 1750151857

So it feels like it finally let me do one thing I'd wanted for some time: scan printed documents and generate structured pdfs (and not pdf as a picture container).

wisdomseaker · 2025-06-17T07:14:06 1750144446

Would any of this be able to handle magazine layouts? I've yet to find anything that can follow their fairly random layouts with text at varying angles etc

uselesswords · 2025-06-17T14:15:14 1750169714

Have you found it has better accuracy or scales with larger models? Or are the improvements, if any, marginal compared to the 3B VLM model?

gibsonf1 · 2025-06-16T19:58:37 1750103917

Does it hallucinate with the LLM being used?

michaelt · 2025-06-16T22:47:31 1750114051

Sometimes. I just fed the huggingface demo an image containing some rather improbable details [1] and it OCRed "Page 1000000000000" with one extra trailing zero.

Honestly I was expecting the opposite - a repetition penalty to kick in having repeated zero too many times, resulting in too few zeros - but apparently not. So you might want to steer clear of this model if your document has a trillion pages.

Other than that, it did a solid job - I've certainly seen worse attempts to OCR a table.

[1] https://imgur.com/a/8rJeHf8

nattaylor · 2025-06-16T20:34:02 1750106042

The base model is Qwen2.5-VL-3B and the announcement says a limitation is "Model can suffer from hallucination"

gibsonf1 · 2025-06-16T22:44:50 1750113890

Seems a bit scary that the "source" text from the pdfs could actually be hallucinated.

prats226 · 2025-06-16T23:55:33 1750118133

Given that input is image and not raw pdf, its not completely unexpected

generalizations · 2025-06-16T16:25:34 1750091134

Does it have a way to extract the images themselves, or is that still a separate process later?

j45 · 2025-06-16T20:09:15 1750104555

If you are after extracting images from pdfs there’s plenty of tools that do that just fine without LLMs.

generalizations · 2025-06-16T20:15:32 1750104932

I mean, ideally it would be in context, so the generated markdown references the correct image at the correct location in the doc. Unless that's what you're talking about? In which case I don't know about those tools.