Very interesting! Is this the state of the art for accurate OCR of tabular PDFs,...

SnooSux · 2025-06-19T18:32:52 1750357972

There's lots of posts on HN for developments and companies doing OCR and Document Extraction. It's a classic CV problem but still has come a long way in the past couple years

dwillis · 2025-06-19T19:58:31 1750363111

Yeah, this is a very well-traveled road, but LLMs have made some big improvements. If you asked me (the guy who wrote the original piece linked above) what I'd use if accuracy alone was the goal, probably would be AWS Textract. But accuracy and structure? Gemini.