Thanks! Appreciate the kind words. I should have in the next month or so (interviewing and finishing my Master's, so there's been delays) a follow up that follows more advancements in the router style VLA, sensoiromotor VLM, and advances in embedding enriched vision models in general.
If you want a great overview of what a modern robotics stack would look like with all this, https://ok-robot.github.io/ was really good and will likely make it into the article. It's a VLA combined with existing RL methods to demonstrate multi-tasking robots, and serves as a great glimpes into what a lot of researchers are working on. You won't see these techniques in robots in industrial or commercial settings - we're still too new at this to be reliable or capable enough to deploy these on real tasks.
If you want a great overview of what a modern robotics stack would look like with all this, https://ok-robot.github.io/ was really good and will likely make it into the article. It's a VLA combined with existing RL methods to demonstrate multi-tasking robots, and serves as a great glimpes into what a lot of researchers are working on. You won't see these techniques in robots in industrial or commercial settings - we're still too new at this to be reliable or capable enough to deploy these on real tasks.