Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This appears to be a classic vision fail on the VLM's part. Which is entirely unsurprising for anyone who has used open VLMs for anything except ""benchmarks"" in the past two god damn years. The field is in a truly embarrassing state, where they pride themselves how it can solve equations off a blackboard, yet couldn't even accurately read a d20 dice roll among many other things. I've tried (and failed) to have VLMs accurately caption images for such a long time, yet anytime I check on the output it is blindingly clear that these models are awful at actually _seeing things_.


5-10 years and Radiologists will be out of a Job, just you wait and see.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: