Hacker News new | past | comments | ask | show | jobs | submit login

A trained image recognition model could probably recognize a dishwasher from an image.

But that won't be the same model that writes bad poetry or tries to autocomplete your next line of code. Or control the legs of a robot to move towards the dishwasher while holding a dirty plate. And each has a fair bit of manual tuning and preprocessing based on its function which may simply not be applicable to other areas even with scale. The best performing models aren't just taking in unstructured untyped data.

Even the most flexible models are only tackling a small slice of what "intelligence" is.




ChatGPT, Gemini and Claude are all natively multimodal. They can recognise a dishwasher from an image, among many other things.

https://www.youtube.com/watch?v=KwNUJ69RbwY


Can they take the pictures?



Technically yes they can run functions. There were experiments of Claude used to run a robot around a house. So technically, we are not far at all and current models may even be able to do it.


Please re-read my original comment.

"The AI systems need bodies to actually learn these things."

I never said this was impossible to achieve.


Can your brain see the dishwasher without your eyes?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: