I saw your foundation model is trained on data from several different robots. Is the plan to eventually train a foundation model that can control any robot zero shot? That is, the effect of actuations on video/sensor input is collected and understood in-context and actuations are corrected to yield intended behavior. All in-context. Is this feasible?
More specifically, has your model already exhibited this type of capability, in principle?
More specifically, has your model already exhibited this type of capability, in principle?