One of the big flaws in Karpathy's logic is that it implies human vision is acceptable and sufficient for an AV. The reality, as Cruise found out, seems to be that society will demand AVs are much safer than humans.
Human vision is an existence proof for human-level performance without lidar, but Waymo is an existence proof for 10x human performance WITH lidar. Right now the latter is where the bar is, and it'll keep being raised. I don't think at this point one could get away with deploying AVs at scale that are significantly less safe than Waymo.
Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?
> Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?
I think Karpathy's point is that Tesla wants to try to avoid the "entropy" that comes from adding a sensor (senior software engineers and higher understand this concept). Every sensor (and every version of it -- sensor hardware does get updated) you add requires recalibrating the software stack, the hardware design, which introduces points of failure every time you roll it out.
According to Karpathy, Tesla does use Lidar -- but only at training time, as a source of truth. Once the weights are learned, they operate without the Lidar.
Have a full sensor suite may work for Waymo at the current scale (limited cities), but scaling beyond that poses problems.
Whereas Tesla has to work with a different set of scaling economics -- that of a mass market vehicle already deployed globally.
Human vision is an existence proof for human-level performance without lidar, but Waymo is an existence proof for 10x human performance WITH lidar. Right now the latter is where the bar is, and it'll keep being raised. I don't think at this point one could get away with deploying AVs at scale that are significantly less safe than Waymo.
Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?