Waymo spends a lot on human oversight, where remote operators make a lot of "common sense" decisions that don't require an immediate response and which AI is not yet capable of solving. Humans actually do a lot of the navigation, suggesting paths that the car can drive along. An example I saw was a fire truck parked at an odd angle, poking out into the street, and the software didn't know what it was or what to do. The operator drew a path around the truck for the car to follow. This only works for taxis: it would be impossible for Tesla to do this since there aren't enough human operators to hire.
But I suspect this means Waymo's software is ultimately more risk adverse. If a Tesla stops in the middle of the road then the customer has to take action, which is frustrating and makes the technology look bad, so there is strong incentive to remove that frustration even at the cost of safety. If a Waymo stops, the remote operator has to take action, and the customer can keep staring at their phone without being particularly affected - it just seems like the Waymo is "thinking."
Honestly, this is what a self-driving car should be. No interaction from the passenger required. Maybe eventually we'll be able to replace the human operator, but until then, risk averse AI where somebody remotely solves any unforeseen or unexpected issue is a decent compromise.
1. They've been doing it for ages. They had cars on the street fifteen years ago.
2. They bet on hardware that's not just cameras. Cameras—in practice—are still not the best tool for the job. Cameras see in 2D, they get dirty, they are easily blinded and obscured by dirt, etc.
3. They have data from every Google Street view and mapping car ever deployed. They have the most data and the most current data. Every Tesla on the road would need to be maxing out its LTE connection all the time and they still wouldn't have the breath and quality of data that Google has.
4. Google is throwing money at Waymo. They can see the potential profit if they win. They're not going to get dumped like Cruise.
Any background info on the betting on cameras alone? It sounds as silly as betting on an artificial version of our proprioception to be implemented in cars to measure acceleration. I also don't think they went all the way regarding neuromorphic engineering with spiking neural nets and artificial retinas. It's just so random to me what was decided to be good enough for autonomous navigation.
Tesla went from very expensive cars down to cheaper ones. It would make so much more sense to do the same for perception. First go over board and go for high bandwidth input and lots of processing power and optimize later.
The betting on cameras alone is basically an Elon Musk thing. His reasoning is basically that if humans can do it AI should be able to do it. So far the software isn't really up to it but time will tell. Some stuff - https://www.engineering.com/now-revealed-why-teslas-have-onl...
I used to regularly have to make a left turn onto a rural highway on foggy mornings. Sometimes people drive faster than they should in fog. Sometimes fast enough that by the time they could see I'm in the intersection turning they would be too close to stop.
Cars going fast enough to have that problem made enough sound that they could be heard quite a bit farther away than they could be seen. I'd open my windows at the intersection and listen until I couldn't hear any highway traffic. Then I'd know that any approaching cars are far enough away that I should have time to turn onto the highway and get up to speed before they arrive.
Yeah. Also I don't know how good the Tesla cameras are but my car has a reversing camera and it's ok for going back 2m at 2mph but kind of terrible compared to looking forward through the windscreen.
IIRC I think it’s the section (1:23:25) – Camera vision
The TL;DR is that sensor fusion is really hard, and their bet was that keeping the training pipelines simpler would let them scale faster/easier, and human vision is the existence proof that it can be done without lidar.
One of the big flaws in Karpathy's logic is that it implies human vision is acceptable and sufficient for an AV. The reality, as Cruise found out, seems to be that society will demand AVs are much safer than humans.
Human vision is an existence proof for human-level performance without lidar, but Waymo is an existence proof for 10x human performance WITH lidar. Right now the latter is where the bar is, and it'll keep being raised. I don't think at this point one could get away with deploying AVs at scale that are significantly less safe than Waymo.
Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?
> Also: if sensor fusion is so hard, why is Waymo able to solve it but not Tesla?
I think Karpathy's point is that Tesla wants to try to avoid the "entropy" that comes from adding a sensor (senior software engineers and higher understand this concept). Every sensor (and every version of it -- sensor hardware does get updated) you add requires recalibrating the software stack, the hardware design, which introduces points of failure every time you roll it out.
According to Karpathy, Tesla does use Lidar -- but only at training time, as a source of truth. Once the weights are learned, they operate without the Lidar.
Have a full sensor suite may work for Waymo at the current scale (limited cities), but scaling beyond that poses problems.
Whereas Tesla has to work with a different set of scaling economics -- that of a mass market vehicle already deployed globally.
Not skimping on sensors and having invested much more time and money and compute than their competitors.
They were doing it first by years and have been spending the most on it throughout so not surprising they'd be ahead. Even now they've probably got several times more employees working on autonomy than Tesla
I don’t know, but it’s impressive. I’ve only ridden in one a few times, but the one at night tremendously impressed me. It successfully managed to navigate around various obstacles like people wandering into the street, a parked cop car blocking a lane and then some, etc.
At least in SF the sensor suite on these must cost $$$. Tesla is like 6 cameras. These things have sensor and camera bumps everywhere. Tesla also struggled with route selection especially end of trip - google has very strong mapping and street view info.
Compared to people who drive pretty poorly most of these will probably do better