Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very interesting! Is this the state of the art for accurate OCR of tabular PDFs, or is there other work in the space to compare against?


There's lots of posts on HN for developments and companies doing OCR and Document Extraction. It's a classic CV problem but still has come a long way in the past couple years


Yeah, this is a very well-traveled road, but LLMs have made some big improvements. If you asked me (the guy who wrote the original piece linked above) what I'd use if accuracy alone was the goal, probably would be AWS Textract. But accuracy and structure? Gemini.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: