1. They've been doing it for ages. They had cars on the street fifteen years ago.
2. They bet on hardware that's not just cameras. Cameras—in practice—are still not the best tool for the job. Cameras see in 2D, they get dirty, they are easily blinded and obscured by dirt, etc.
3. They have data from every Google Street view and mapping car ever deployed. They have the most data and the most current data. Every Tesla on the road would need to be maxing out its LTE connection all the time and they still wouldn't have the breath and quality of data that Google has.
4. Google is throwing money at Waymo. They can see the potential profit if they win. They're not going to get dumped like Cruise.
Any background info on the betting on cameras alone? It sounds as silly as betting on an artificial version of our proprioception to be implemented in cars to measure acceleration. I also don't think they went all the way regarding neuromorphic engineering with spiking neural nets and artificial retinas. It's just so random to me what was decided to be good enough for autonomous navigation.
Tesla went from very expensive cars down to cheaper ones. It would make so much more sense to do the same for perception. First go over board and go for high bandwidth input and lots of processing power and optimize later.
The betting on cameras alone is basically an Elon Musk thing. His reasoning is basically that if humans can do it AI should be able to do it. So far the software isn't really up to it but time will tell. Some stuff - https://www.engineering.com/now-revealed-why-teslas-have-onl...
I used to regularly have to make a left turn onto a rural highway on foggy mornings. Sometimes people drive faster than they should in fog. Sometimes fast enough that by the time they could see I'm in the intersection turning they would be too close to stop.
Cars going fast enough to have that problem made enough sound that they could be heard quite a bit farther away than they could be seen. I'd open my windows at the intersection and listen until I couldn't hear any highway traffic. Then I'd know that any approaching cars are far enough away that I should have time to turn onto the highway and get up to speed before they arrive.
Yeah. Also I don't know how good the Tesla cameras are but my car has a reversing camera and it's ok for going back 2m at 2mph but kind of terrible compared to looking forward through the windscreen.
IIRC I think it’s the section (1:23:25) – Camera vision
The TL;DR is that sensor fusion is really hard, and their bet was that keeping the training pipelines simpler would let them scale faster/easier, and human vision is the existence proof that it can be done without lidar.
One of the big flaws in Karpathy's logic is that it implies human vision is acceptable and sufficient for an AV. The reality, as Cruise found out, seems to be that society will demand AVs are much safer than humans.
Human vision is an existence proof for human-level performance without lidar, but Waymo is an existence proof for 10x human performance WITH lidar. Right now the latter is where the bar is, and it'll keep being raised. I don't think at this point one could get away with deploying AVs at scale that are significantly less safe than Waymo.
Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?
> Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?
I think Karpathy's point is that Tesla wants to try to avoid the "entropy" that comes from adding a sensor (senior software engineers and higher understand this concept). Every sensor (and every version of it -- sensor hardware does get updated) you add requires recalibrating the software stack, the hardware design, which introduces points of failure every time you roll it out.
According to Karpathy, Tesla does use Lidar -- but only at training time, as a source of truth. Once the weights are learned, they operate without the Lidar.
Have a full sensor suite may work for Waymo at the current scale (limited cities), but scaling beyond that poses problems.
Whereas Tesla has to work with a different set of scaling economics -- that of a mass market vehicle already deployed globally.
1. They've been doing it for ages. They had cars on the street fifteen years ago.
2. They bet on hardware that's not just cameras. Cameras—in practice—are still not the best tool for the job. Cameras see in 2D, they get dirty, they are easily blinded and obscured by dirt, etc.
3. They have data from every Google Street view and mapping car ever deployed. They have the most data and the most current data. Every Tesla on the road would need to be maxing out its LTE connection all the time and they still wouldn't have the breath and quality of data that Google has.
4. Google is throwing money at Waymo. They can see the potential profit if they win. They're not going to get dumped like Cruise.