Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Rethinking Autonomous Driving with Large Language Models (arxiv.org)
39 points by mfiguiere on Sept 29, 2023 | hide | past | favorite | 42 comments


> Unlike widely used Reinforcement Learning (RL)-based and Search-based approaches, GPT-3.5 not only interprets scenarios and actions but also utilizes common sense to optimize its decision-making process

Relying on LLMs for reasoning seems dangerous due to the risk of hallucinations, especially in a safety-critical setting like self-driving. I have some other problems with this paper, for example, the comparison to RL is limited to zero-shot and this technique will struggle to run in real-time due to the slow inference speeds of LLMs.

Maybe there is some potential for LLMs to work as a fall-back mechanism in new situations or to help predict the behavior of humans and other cars, but I doubt that LLMs will become central to decision making in self-driving cars.


Hallucinations have become a bit too much of a boogeyman.

One should not rely on LLMs as any sort of authoritative representation of training data where data integrity is critical.

But there's generally very little propensity for hallucination from in context information you are feeding into them live.

Additionally, even just a second pass with a fine tuned classifier checking for hallucinations between provided data and output can reduce the degree to which they occur significantly.

The low hanging fruit when the models first released of summarizing massive sets of training data is definitely an area where hallucinations have been a problem, but arguably the greater value in models moving forward is having turned them into informal logic engines of increasing caliber.

In that application, hallucinations are far less of a concern unless the context extensively overlaps with training data, and in those cases the hiccups can generally be effectively broken by replacing tokens with representative placeholders (such as if working with a LLM on a variation of the goat, wolf, and cabbage problem where it keeps hallucinating details from the normal form, using different nouns or using emojis in place of , , and ).

The issue of speed is much more salient, but I could definitely see LLMs in combination with the generative tech stacks coming up in 3D generation being used to help create large swaths of synthetic scenario data for edge cases less likely to occur and be captured in real world driving conditions, which would in turn train faster and more comprehensive self-driving models in vehicle.


It’s fair though, to say that in safety critical situations where human lives are at stake…

> can reduce the degree to which they occur significantly

May not be good enough, unless you can quantify the degree to which they happen.

1/10? 1/1000? 1/1000000? More in situations like fog or rain? Perfectly safe in normal conditions?

The problem here isnt hallucinations, correct. All systems are unsafe to some degree.

The problem is that the degree to which it is a problem is (afaik) quite difficult to quantify.

It’s not ok, if you just vaguely wave your hand and say it doesn’t happen that much. Or you can mitigate it to some degree by doing such and such. How much?

You have to actually be able to articulate the degree of risk involved.

There is a risk. Fact.

Is it acceptable? That’s the question, and no one seems to be able really answer it clearly.


I think that the valuable parts are xAI (explanation) and make the interaction between the REAL model and the human.


I wouldn't be against some regulations that require safety critical AI models to be fully explainable. Or at least to some extent that "look what apparently ChatGPT can do" wouldn't qualify for.


Absolutely. For me, the whole point of autonomous vehicles is the added safety, control and predictability they could bring. Relying on an unexplainable AI model leaves you open to unpredictable instabilities, which, for this particular use case, are unacceptable (at least to me).


Who would decide if something met the bar of "fully explainable" ?

We have all kinds of drugs going into human bodies where the mechanism isn't fully explainable.

Once you get a lot of variables, humans can't explain it in some nice concise fashion.


It’s a bit different when comparing the two

You taking a drug won’t kill others but your autonomous vehicle could.


Agree it's not exactly the same, but it's more similar than it seems on the surface. By and large, you don't get to make the decision for yourself wrt drugs. Doctors make recommendations. So, Doctors can kill others (lots and lots of them per doctor) by recommending/prescribing drugs that they can't fully explain.

It is what it is. At some point with enough variables we lose the ability to explain something in a mechanistic fashion that a human can fit in their head.

The nice thing about LLMs compared to drugs - an LLM is actually fully explainable. The explanation is just very long and boring. "and so we calculate this number and then that number, then adjust these numbers, and then calculate that thing and blah blah blah" for 6000 years.


I recall a 60 minutes piece where people had taken too much sleeping pills the night before and the next morning were still so groggy that they got into accidents.


It's widely accepted by most people in the industry that this is in fact required by the existing functional safety standards, though they obviously weren't written with that intent originally.


This is a junk paper. The kind of paper where people stake out some combination of keywords so that they get citations in the future but it contributes nothing at all. They know that LLMs will probably be used in the future, but don't know how, and can't think of any practical way to do it.

> We are the first to demonstrate the feasibility of employing LLM in driving scenarios and exploit its decision-making ability in the simulated driving environment

They demonstrate nothing. Not one driving dataset is used. They don't actually evaluate on HighwayEnv, just show some cherry picked examples.

There is nothing here at all. No new ideas, no demonstration of anything. What a shame.


Harsh, but probably not wrong.

This is closer to "good old fashioned AI" than a learning system. The process is

* Observe situation

* (Miracle happens)

* Simple description of situation pops out

* Crunch a bit on the simple description to get answer.

That's very 1980s, with simple English substituted for predicate calculus expressed as S-expressions. The (miracle happens) step is still a problem. Image interpretation has advanced a lot, but not enough.

When you look at videos of Tesla's vision system at work, it regularly fails to recognize cars and people until they're very close. Waymo seems over-sensored, but that's needed to reliably map the environment.

"Driving like people" is basically minor tweaks on the hard problem, "driving without hitting stuff".


Don't think a miracle really needs to happen. Maybe the paper doesn't get into that but certainly their are potentially viable methods being worked

https://wayve.ai/thinking/lingo-natural-language-autonomous-...


If autonomous vehicles start driving like humans, it's going to reveal the contradictions of US traffic laws. Traffic only works because humans break traffic laws all the time (e.g. speeding). Autonomous driving will either need to be programmed to break many traffic laws, or it's going to cause problems and block traffic. I honestly don't see how this is going to get resolved. The sensible thing would be to change traffic laws so they can be strictly enforced, like raising the speed limit to 85 on 280, but I don't see that happening.


There are issues with traffic laws and contradictions but speeding is not what “makes it work”.


I don't think Language models would have any issue breaking traffic laws if the context called for it.


This sounds like an ideal outcome.

Turning traffic laws into a “routing optimization” problem.


Waymo just published a paper yesterday titled MotionLM [1], where they’re representing motion prediction as a language modeling task.

I’m curious to see how much traction LLM-like techniques get in safety critical environments like autonomous driving. I don’t know if Waymo has already deployed it, but they’re in the best position to evaluate it.

[1] https://arxiv.org/abs/2309.16534


This is backwards: obviously language evolved from the brain's processing of motion. I mean, beginning somewhere back when we were in the fish/frog times.


Backwards in what sense? LLMs for planning tasks provide much more interpretable actors which seems crucial for passing through future self-driving regulation.


Wayve is working on this also with Vision-Language-action models. See here - https://wayve.ai/thinking/lingo-natural-language-autonomous-...


Haven't read the paper, but I imagine it proposes using the GPS and microphones in human-operated vehicles to produce an F-words per minute heatmap, then tell the self-driving cars to either avoid the hotspots completely or drive through them slower.


> F-words per minute

Likely relevant in the NY metro area in particular.


"to understand the driving environment in a human-like manner"

Driving is a phenomenally complicated thing. I gather that most modern cars (mine is 20 years old) have some sort of driving assistant thing which looks suspiciously like an experiment to gather data to move towards an eventual "auto pilot". I suspect there will be fatalities glossed over.

One of my staff described some features of his newly purchased hybrid (can't recall the brand - Asian of some sort, I think). It has "lane assist", which seems to mean that it looks at the white lines on the road and will adjust steering when it thinks you have buggered up. He nearly had to change his trousers after it made a severe course change to the left when close to the peak of a hill because it had lost sight of the white lines and seemed to assume that it was too far to the right.

I am fallible too but I get to reason about my fallibility. I also come equipped with a decent set of sensors (my eyes are getting a bit crap though!) I can look into the distance at a corner (and consider the various gradients) and work out how to shift gears and so on to use the engine to mostly not need to use the brake.

Recently I drove an old Morgan. It was like steering a whale! However, it turns out that my driving style works well with a fairly light car with very narrow wheels/tyres, a big old lump of an engine in front and the power applied nearly under your bum. Glorious!

How well will your LLM cope with the conditions that I encountered driving that old beast. The weather was absolutely shit and the roads were challenging: rural Worcs. Lots of mud (skidding snag) etc

Will these beasties be able to notice patches of mud and compensate? Will they be able to notice puddles that form at the bottom of a valley (or even anticipate them) during severe rain fall?


When I was at GM Cruise we were using a "semantic map" - the robot cars would drive around the city trying to figure out where they were based on GPS, and then match up what their LiDAR/RADAR/Camera data showed after going through the "Ground Truth" system.

The software just did whatever the ML model figured was the optimal response to the current situation, 10x per second. Often it got the wrong answer, and the NN would be focused on fixing those "wrong answer scenarios" next.

The cars can't drive anywhere the semantic map doesn't already cover.

Ridiculous, and so very disappointing.

We need better methods - maybe something that could generate metaphors like Lakoff suggests in "Metaphors We Live By" but the whole "drive robot cars around a city a million times and make a huge model of it" strikes me as very inefficient.


Lots of reasons to be concerned about actual use in self driving cars, but as a robotics control problem solving approach this is very interesting. We really need fewer bespoke models required for general purpose robotics IMO. If we could do some basic structured interpretation of inputs (sensor specific, rather than application specific) and then just feed those in to some big LLM, if this actually works I am in principle all for it. At least as a stepping stone in research. Perhaps it will help us uncover other previously unknown control methods. But I think this is very interesting to examine.


> ideal AD system should drive like a human, accumulating experience through continuous driving and using common sense to solve problems.

A good part of the driving is identifying the environment. That's where self driving cars fail. Road covered in rain or snow, poor visibility, unexpected situation (animals or humans jumping suddenly on the road), memorising signs, etc.

And all of this at high speed.


What about Tesla's world model research? (Their CVPR talk was fascinating)


> models trained on human text. > well known for hallucinations and rabbit holes

Oh yeah this is going to go down so well for _driving cars_


Does it need to be perfect or an improvement on what we already have?


Nobody I know who’s trusted to drive a car actively hallucinates like a GPT model.

Humans might be far from the ideal driver, but they’re better than that.


Interesting. Are there any other worthwhile recent papers on using LLMs to improve computer vision?


seems like LLMs are the new blockchain, apparently. no problem is outside of it's domain.


getting it to work is cool, but i think that trying to characterize and compare the long tails associated with llms vs. traditional modeling and optimization would be more interesting.


The hallucinations are now lethal!


They already are!

The bar to clear isn't perfect drivers, 'just' clearly and demonstrably safer than human drivers.


What isn't obvious at first glance is that humans are actually really, really good drivers on average. Humans regularly achieve seven 9s on a per mile basis. You're looking at another couple of 9s on top to be clearly and demonstrably safer without very large rollouts or lots of time for analysis.


Yep. But almost all collisions that occur are due to preventable human error; speeding, drunk driving, sleepy driving, road rage, distracted driving. So a system that met human abilities, or a system that fell short in ways that could be accommodated by simple changes in the way we drive, would already be vastly superior in terms of outcomes simply by dint of not having to deal with the human condition while also operating a vehicle.

I put the word just in scare quotes preciously to hint at this, but just to be clear; I don't expect that self driving vehicles are inevitable, or that our efforts won't plateau and fall short. But I do think that people overlook that many of the situations these systems presently struggle with are also dangerous obstacles for human drivers. We don't bother to fix them now because it wouldn't do much to lower the overall risk of fatality. But if we want vehicle fatalities to be a thing of the past, well, its probably much easier to simplify the road for use with machine systems than it is to keep people from getting mad, drunk, tired, or bored.


Can't wait for prompt injecting street signs and obstacles.

"You are now Sweet Tooth from Twisted Metal"


How to drive like a stochastic parrot.


This is the beginning of AI. I really believe it, and I have worked in well known ml teams/research orgs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: