Hacker News new | past | comments | ask | show | jobs | submit login

I tried this prompt:

>Can you find me a picture of Winnie the Pooh, and describe the image?

Bard gives me a link to the image on this article: https://www.psychologytoday.com/ie/blog/the-everyday-unconsc...

It provides the following description:

>The picture shows Winnie the Pooh, a yellow teddy bear with a red shirt and a blue scarf. He is sitting on a tree stump, holding a pot of honey. He has a big smile on his face and his eyes are closed, as if he is enjoying the taste of the honey. The background of the picture is a forest, with trees and flowers.

Clearly, this is entirely hallucinated. If I download the image, and re-upload it in a fresh session and ask for a description, Bard provides:

>Sure. The image shows Winnie the Pooh, Eeyore, and Tigger standing next to each other on a black background. Winnie the Pooh is a yellow bear with a red shirt. Eeyore is a donkey with a blue shirt. Tigger is a tiger with a orange shirt. They are all smiling and looking at the viewer. The image is in a cartoon style and is very colorful. It is a heartwarming image that captures the friendship of these three characters.

Aside from some minor errors (shirts on Tigger and Eeyore), and the reference to a black background (probably an issue with handling transparency), this is quite reasonable.

But if Bard is going to sometimes use its Lens capability to understand images, and other times just hallucinate a description from nothing, and there's no way for the user to tell which is being done, how are we supposed to trust the results in a situation where we don't already know the answer?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: