Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you check all of the samples provided? It can read an entire research paper and understand the figures just from the images of the papers pages. This seems to be a much deeper connection than extracting captions.


Are you sure? Sounds too epic


See the real examples for yourself, starting on page 34 ... mind-blowing.

https://cdn.openai.com/papers/gpt-4.pdf


The extreme ironing image example has a bullshit explanation in the paper. The extreme ironing on back of taxi is a popular photo with lots of text associated with that picture: https://google.com/search?q=extreme+ironing+taxi&tbm=isch

Give the model new images that are not in the training set (e.g. photos not on internet, or photos taken after model trained) and ask the same question and see how well it does!

The paper says: “Table 16. [snip] The prompt requires image understanding.”

I think the explanations (in the paper by OpenAI for the images) are probably misinformation or misdirection. I would guess it is recognising the images from it’s training and associating them with nearby text.


It seems like they used some unknown images in the livestream, see replies to: https://news.ycombinator.com/item?id=35157940

However, I still think they should not have used images from the internet/training set in their paper. And to be safe, neither should they use “generated” images.

I am looking forward to taking photos of some paintings by friends and seeing if ChatGPT can describe them!


It's SOTA on DocVQA[1] so yeah it is able to read text/graphs/tables from images

[1] https://www.docvqa.org/




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: