Hacker News new | past | comments | ask | show | jobs | submit login

I am not sure whether the videos are representative of real life performance or it is a marketing stunt but sure looks impressive. Reminds of the robot arm in Iron Man 1.



AI demos and even live presentations have exacerbated my trust issues. The tech has great uses but there is no modesty from the proprieters.


Google in particular has had some egregiously fake AI demos in the past.


> Reminds of the robot arm in Iron Man 1.

It's an impressive demo but perhaps you are misremembering Jarvis from Iron Man which is not only far faster but is effectively a full AGI system even at that point.

Sorry if this feels pedantic, perhaps it is. But it seems like an analogy that invites pedantry from fans of that movie.


The robot arms in the movie are implied to have their own AIs driving them; Tony speaks to the malfunctioning one directly several times throughout the movie.

Jarvis is AGI, yes, but is not what's being referred to here.


Ah good point!


i thought it was really cool when it picked up the grapes by the vine

edit: it didn't.


Here it looks like its squeezing a grape instead: https://www.youtube.com/watch?v=HyQs2OAIf-I&t=43s Bit hard to tell whether it remained intact.


The leaf on the darker grapes looks like a fabric leaf, I'd kinda bet they're all fake for these demos / testing.

Don't need the robot to smash a grape when we can use a fake grape that won't smash.


The bananas are clearly plastic and make a "doink" nose when dropped into the bowl.


Haha show the whole room and work either on a concrete floor or a transparent table.

This video reeks of the same shenanigans as perpetual motion machine videos.


welp i guess i should get my sight checked


And how it just dropped the grapes, as well as the banana. If they were real fruits, you wouldn't want that to happen.


I remember a cartoon where a quality inspection guy smashes bananas with a "certified quality" stamp before they go into packaging.


[flagged]


This is, nearly exactly, like saying you've seen screens slowly display text before, so you're not impressed with LLM.

How it's doing it is the impressive part.


the difference is the dynamic nature of things here.

Current arms and their workspaces are calibrated to mm. Here it's more messy.

Older algorithms are more brittle than having a model do it.


For the most part that's been on known objects, these are objects it has not seen.


Not specifically trained on but most likely the Vision models have seen it. Vision models like Gemini flash/pro are already good at vision tasks on phones[1] - like clicking on UI elements and scrolling to find stuff etc. The planning of what steps to perform is also quite good with Pro model (slightly worse than GPT 4o in my opinion)

1. A framework to control your phone using Gemini - https://github.com/BandarLabs/clickclickclick


That's a really cool framework you've linked.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: