This article misses a lot of things technology wise. OK, I understand that.
This article also misses a lot air companies like Daimler who showed off self driving car captibilities publicly in the beginning 2000 and on the web 3 years ago. It misses the prototypes shown off by Audi and BMW. It misses also the prototypes shown by Japanese car manufacturers.
From technology companies it misses Bosch who will show off at CES their own developed car and as such their expertise. It misses Delphi who works with MobilEye on self-driving platform to be used by the OEM, and it misses other TIER1 like Conti who have also shown their work.
By the way, the Mitsubishi Outlander is currently the best equipped series car who can do self-driving capabilities, it has a stereo camera system and Lidar combined with other other radar sensors. Next are the luxury cars from Audi, BMW, Daimler, and Volvo. Tesla here is last with only a simple radar and camera system (no stereo view), which has also no night vision capabilities like the others have.
Covering every detail of the technology wasn't really the point of the article. It's a New York Times article. It's meant to give a brief explanation, for non-engineers, as to how this technology works. The target audience don't need to know all the different companies' achievements nor the specifics of the technology used.
I just had a look at the website of the Outlander. Its not mentioned anywhere that it has self driving capabilities. The sensors are used to warn the driver and intervene if a crash is imminent.
So can I conclude there is a difference in the (state of) implementation between the hardware (sensors) and the software (self-driving intelligence) ?
Humans only have one sensor: a rotatable stereo camera. So at least in theory the number of sensors seems not the most important element ;-)
Tesla seems number 1 in pushing the frontiers in marketing, so it may also ahead in software, to compensate what it is lacking on the hardware side ?
> Humans only have one sensor: a rotatable stereo camera. So at least in theory the number of sensors seems not the most important element ;-)
I hear this comment a lot when defending Tesla's choices, and it's a red herring. The fact that humans only rely on two cameras means nothing. Repeating old comments of mine:
You also don't "need" megawatts of power to play top-level Go: humans do it with 100 watts of energy. Yet Google needed who knows how many megwatts of energy to train and run AlphaGo on their massive server farm.
Imagine two companies competing to win at Go, and one company had the attitude that megawatts of energy was not necessary for training and prediction, and another company threw the biggest GPU farm they could. The second company just played top-level Go this year. The first company is ~10 years away from a low-energy elite Go computer.
Humans implicitly perform SLAM (simulataneous localization and mapping). What do I mean? Look around your room. Close your eyes. Visualize the room. As a human, you've built a rough 3D model of the room. And if you keep your eyes open and walk through the room, that map is pretty fine-grained/detailed too and humans can keep track of where they are in the map.
Doing this accurately in moving environments (especially with lots of pure forward motion) with just two cameras is still a wide open research problem.
Doing this with LIDAR/GPS/IMU/recorded maps is solved. That's why people use LIDAR.
Matching the abilities of human perception is an insanely hard problem. Don't let cute problems like image classification fool you. Why make the problem even harder?
Even outside of the good points made by argonaut - our eyes are _amazing_ cameras. They are super high resolution, have many color bits, and work well in low light and bright situations.
Your point is generally valid, but it's worth noting that power =/= energy. It is conceivable that at 100 watts of power, a human could use on the order megawatt hours of energy while practicing to achieve top-level performance. 10000 hours of practice is not unreasonable to achieve mastery which would be 1MWh.
It's also likely that AlphaGo is reasonably efficient, given that it used custom ASICs.
There is also a difference between learning and playing. During play, the human operated at ~20 watts on computation while the machine ran at a rate of anywhere from 26,000 watts to 260,000 watts, depending on how efficient the TPUs are (and assuming 10x as the ideal case). The human is also learning new things about Go as it plays, planning complex muscle firing programs, filtering audio and managing attention, working on subconscious goals, running complex vision tasks, all while running its autonomic subsystem.
Low power is also still important due to issues of heat and energy availability. Low power also implies high efficiency which is important for several reasons.
The human brain is estimated at 20 watts (when people talk about computing systems they tend to not include the power needed for all the auxiliary infrastructure needed to keep it networked and cooled); it's also estimated that beyond 4 hours a day, learning effectiveness drops precipitously.
If we take the case of Go, you can take a 4 year old human and have a professional player by 13. This is about 950 megajoules spent by the brain while learning Go. For the machine, if you look at the learning part (self play, value and policy on 50 GPUs for several weeks) the estimate on energy spend is about 30,000 megajoules. The policy network is itself ~20,000 MJ, while the full AlphaGo system playing on a single GPU and 48 CPUs is just a strong amateur.
But this is not even an apples to apples comparison since the brain is not spending all of its energy on learning Go. In fact, learning how to play Go is very far from the most difficult thing the brain is learning how to do.
You're confusing professional level with world champion level. How many megajoules will it take to create a world champion Go player using the human brain? It would take multiple brains, each teaching each other. We can now train a professional-level Go player pretty cheaply — Zen Go plays at a professional level and runs on commodity hardware.
I did not actually. I pointed out the approx energy required to get to 1 dan professional then pointed out that a system trained with orders of magnitude more energy was still far less capable. To get to Lee Sedol is still < 3000 MJ (30 years of daily practice and study) which is still an order of magnitude less energy than training a single amateur level policy network.
To say AlphaGo or any RL system is learning from self-play is not in the typical understanding of the phrase. It's more akin to evolving with competitions against previous versions of itself, which should count as different instances. As stated on page 38 of 1604.00289.pdf
Between the publication of Silver et al. (2016) and before facing world champion Lee Sedol, AlphaGo was iteratively retrained several times in this way; the basic system always learned from 30 million games, but it played against successively stronger versions of itself, effectively learning from 100 million or more games altogether (Silver, 2016).
In comparison, from that same paper, it was estimated that Sedol could not have played much more than 50,000 games. My own estimate is about 40,000 games.
As for work required to learn, it's irrelevant to point out that one can learn from others. Learning, whether from play, books or study still requires energy spend and work by the learner. Most of the extra work is from study, occasional review with a tutor and discussions with peers--the last more of a meta-step: learning to learn. Accounting for books and some time with tutors will not, I argue, shift the budget much. Especially if you include that any machine playing Go requires overhead of power infrastructure, energy, cooling, networking equipment and occasional maintenance staff. And learning, improvement in architecture, requires searching through and discarding many changes and playing through a cumulative hundreds of millions of games.
The fact that humans have other available highly efficient means of learning is a boon and not a downfall. That's the whole point of getting to AGI. Learning from books and others is akin to learning by Program Synthesis from specifications.
As I said in my previous post, you ignored that a machine gets to 1 dan professional far more cheaply.
The reason I noted the requirement of other trained professionals for training a human is that those other humans can distill what they have learned over years of play into simple rules. The machine can also use those rules, but the particular machine you are comparing to was specifically trained without any such rules and was required to synthesize them from scratch from historical games and self-play.
No, that is a ridiculous comparison. Then you should start counting the energy required to construct the Google server farm, the energy of all the computers used by all the engineers who built the farm while they were in university, on and on and on.
Now you're suggesting calculating the energy used by the human champion's ancestors, which I am not suggesting.
The computer can train itself with just records of past games and self-play. The world champion level human cannot. You must account for the difference in training.
It was programmed by humans who didn't program in any rules of thumb or patterns for playing Go. The computer synthesized all it needed to understand about how to win at the game from historical games and self-play. The human champion synthesized only some patterns himself and learned most of them from other professional human players.
Patience. It's early for Go. Remember when it took Deep Blue to play grandmaster-level chess? Now your laptop can do it. All of the better commercial chess programs now play above human level. Well above human level; the top programs are rated around 3300 on a good laptop, and the highest human rating is currently 2882.
I've been wondering why no self driving systems seem to use DTAM or similar methods. Realtime dense 3D reconstruction and camera localisation on commodity hardware seems perfect for the job.
There is a big open research problem in the state of the art in visual SLAM (visual SLAM = SLAM from cameras), it doesn't work when the environment is moving (!!!).
Visual slam is still linear-algebra/geometric/keyframe based traditional computer vision (including variants that incorporate GPS/accelerometer info). I think the state of the art is stereo LSD-SLAM, but I could be wrong.
A great way to explain this to people would be to ask them if they've ever seen a panoramic photo glitch, and tell them that's what the car would be seeing most of the time (well, assuming anything is moving, like a car or a person or a piece of trash).
On the component front, last March, Continental acquired the automotive LIDAR product line from Advanced Scientific Concepts.[1] This may help. ASC, which is a bunch of physicists and engineers in Santa Barbara, has a very good flash LIDAR. They sell it to DoD and Space-X for about $100K. There's no reason it needs to be that expensive in volume. Continental is the third largest maker of auto parts. If they can get the ASC technology down in price, all those silly rotating things will go away.
A disadvantage of the ASC is the far more limited field of view. I mean, the rotating thingie has 360 FOV (I personally think it's not silly at all), if you want to see that wide with the ASC's, you've got to put more on a vehicle.
As a side note, how do you guys find out things like who buys who and whos's got what interesting tech? Especially when they're selling only to suppliers, as is the case here? You just stumble on the information, know someone who knows someone or what?
Once the price comes down, you can have more sensors to get full circle coverage. A typical configuration is long-range sensors aimed forward, with shorter-range sensors aimed to the sides and rear. Volvo uses an arrangement like that, and the sensors aren't very visible. One approach is to devote the top inch or so of the windshield to sensors, and design the car's headliner accordingly. Mobileye puts their camera between the rear view mirror and windshield.
There are lots of companies working on low-end LIDAR. ASC's technology is known to work well; it just costs too much because it involves custom sensor ICs built with a nonstandard GaInAs process. Quanergy has been issuing a lot of press releases and getting funding, but they seem to have backed off from their < $100 solid-state sensor and are now selling $7500 rotating machinery like Velodyne's spinning top. There are also some companies touting continuous-wave systems, which usually don't work in sunlight or have much range, but are useful for indoor robots.
Fraunhofer is working on a technology for making flash LIDAR sensors using a regular CMOS process.[1] If that works, these things will come down to digital camera prices in volume.
Bloomberg has most acquisition information. I picked this up because I've been following ASC since the 2005 DARPA Grand Challenge days, when I went to Santa Barbara to see their technology. It was just on an optical bench then, not ready to deploy, but they had the technology with long-term promise. I dragged a VC down there to talk to them, but he didn't see near-term volume application. He was right; only now is there a market in sight for LIDAR units by the millions. High production volume is probably still 5-10 years away.
Re Quanergy: Although the Quanergy site claims all their units have no moving parts, a university in France which evaluated their M8, the only thing they're actually shipping, writes this: The Quanergy M8 LIDAR system consists of 8 2D line scanners located on a spinning head which can spin at a rate from 5 Hz to 30 Hz. The 8 lasers are spread out over a 20° vertical field of view (FOV) and the entire unit rotates to give a full 360° by 20° FOV.[1]
This year, a colleague made a seminar for the Nvidia people in the UK. I was not personally present. But from what I've been told, they had no glue about automotive. The only thing they have, is a high performance processor designed for embedded use, with which they want to be an automotive supplier. AFAIK, Tesla is the first going to use them. But I believe that currently they have no automotive experience in terms of software. They are building it up, currently.
Nvidia is a large distributed organization so it's possible if not likely that the relative understanding of this particular UK team is not reflective of the total expertise at the company.
the nyt is always off the mark on topics i know something about, and nothing leads me to believe they are any better about topics i know nothing about.
I agree with this so much. However, playing the devil's advocate, I guess it depends on the journalist's background.
For instance, a poli-sci major writing about a scientific/tech topic will likely get a lot of things wrong. However, the same journalist might likely have a lot of interesting things to write about international politics, because that's what his background is about.
that being said, I don't know how much journalist writing about politics have studied it in the past.
not that it really matters, i have an econ degree but wouldn't call myself an economist by any stretch.
the real issue, at least in the US, most news is really just either direct or indirect PR. curiously, i think 'X writer' is the term you are looking for. i.e., someone with a background in technology or fashion that reports on things is more likely to be called themselves a 'tech writer' or 'fashion writer'. it's almost like real experts don't even want to be called journalists. i certainly wouldn't.
i've found that you're much more likely to find accurate reporting in industry-specific media rather than general media, because then at least the sources of funding and agenda are pretty clear.
I'd pass on reading this if you clicked through to the comments first. There's very little detail about how self driving cars work. This is a very basic survey of brands. Remember the Murray Gell-Man amnesia effect. Does anyone have real examples / links / videos of how self driving cars work?
Firstly, there are different approaches. Google and Uber seem to have a similar LIDAR + map approach. Tesla and Mobileeye have a camera focused approach.
The CEO of Mobileeye (who is an ex-machine learning professor) gave a very good, mildly technical talk on their approach at CVPR: https://www.youtube.com/watch?v=n8T7A3wqH3Q
> The CEO of Mobileeye (who is an ex-machine learning professor)
The CEO of Mobileye is Ziv Aviram and he was never a professor. Amnon Shashua is the CTO, and he is still a machine learning professor at the Hebrew University.
I found this article very informative. Perhaps I knew most of these tidbit before, but I did like the summary and laymen's explanations, even if not detailed or feature-complete. This is the kind of info I relay to my father when he asks me how this all works.
Going by some of the recent antics of companies like comma.ai and Uber, it sounds self-driving cars mostly work via hype & publicity stunts. Any actual technology that is involved is purely coincidental.
This article also misses a lot air companies like Daimler who showed off self driving car captibilities publicly in the beginning 2000 and on the web 3 years ago. It misses the prototypes shown off by Audi and BMW. It misses also the prototypes shown by Japanese car manufacturers.
From technology companies it misses Bosch who will show off at CES their own developed car and as such their expertise. It misses Delphi who works with MobilEye on self-driving platform to be used by the OEM, and it misses other TIER1 like Conti who have also shown their work.
By the way, the Mitsubishi Outlander is currently the best equipped series car who can do self-driving capabilities, it has a stereo camera system and Lidar combined with other other radar sensors. Next are the luxury cars from Audi, BMW, Daimler, and Volvo. Tesla here is last with only a simple radar and camera system (no stereo view), which has also no night vision capabilities like the others have.