Here's the low-level controller for Comma AI's self-driving software. Lateral an...

lordricky · on July 15, 2018

No safety relevant code is written in Python. All the safety relevant code runs real-time on a STM32 micro (inside the Panda), it's written in C and it's placed at the interface between the car and the Eon. This code ensures the satisfaction of the 2 main safety principles that a Level 2 driver assistance system must have: 1- the driver needs to be able to easily disengage the system at any time; 2- the vehicle must not alter its trajectory too quickly for the driver to safely react. See https://github.com/commaai/openpilot/blob/devel/SAFETY.md

Among the processes that runs on the Eon, you can find algorithms for perception, planning and controls. Most of it is actually autogenerated code in C++ (see model predictive controls). Python code is used mainly as a wrapper and for non-computational expensive parts. To use functional safety terminology, the Eon functionality is considered QM (Quality Management). This means that any failure in delivering the desired output at the right time is perceived as bad quality and has no safety implications. So, how often those algorithms deliver the wrong output because some parts are written in Python? How often because RT isn’t enforced? Negligible. Pretty much all the mistakes of a level 2 driver assistance system are due to the quality of the algorithms, the models, the policies etc… There is a long way to go before changing the coding language will be the lowest hanging fruit to improve the system. Until then, using the simplest and most agile coding language (given performance constraints) is probably the best way to maximize quality.

nodnyl · on July 14, 2018

As someone who has had a bit to do with safety critical software in agricultural machinery, and I am sure to any one in any other regulated industry this sounds crazy. The regulation in many industries for software that can kill is onerous, python on Android for any safety related code in other industries would be the punch line of a joke.

blub · on July 14, 2018

To clarify (as a safety hobbyist) why this is a problem:

* consumer hardware does not normally fulfil automotive safety requirements. It could for instance go into thermal shutdown or into a degraded mode if the temperature is too high. Additionally, there is no HW redundancy, I assume that if any of the HW components of the smartphone fail, the system cannot continue to maintain its safety properties.

* Android is a consumer OS, designed for consumer workloads. A real-time, safety certified OS like INTEGRITY, ThreadX, Nucleus etc should be typically used for such workloads.

* The safety-relevant software running on top of the OS is developed with specific toolchains and using specific programming languages. Some requirements [1] for a language used in safety-critical context are defined behaviour, explicit dependability support (e.g. design by contract), predictable timing, suitability for static verification, significant field use, strong typing (not necessarily static typing!), feasibility to restrict the language to a subset (e.g. MISRA, JSF, etc).

One of the most popular languages for such software is C, which is not safe and doesn't fulfil several of the above criteria. This is mitigated through tooling, processes, code generation, design validation & verification and so on.

[1]: Taken from Embedded Software Development for Safety-Critical Systems, Hobbs. Interestingly the author would personally choose D or Rust for safety-critical development with the condition of having enough confidence in the compilers.

dozzie · on July 14, 2018

> [...] to any one in any other regulated industry this sounds crazy.

This sounds crazy even to a person who makes living from running software on commodity hardware (x86/x86_64). Android does not sound like a hard real-time OS. I've seen it hang up in a fsckin' coffee machine! I don't want it at the center of two tons of steel that move 100km per hour.

HALtheWise · on July 14, 2018

Ummm... If you look in the actual lib/ directory where the work happens, you will see a bunch of high-performance C written in and generated by Acado, which is a very efficient optimization framework. It does look like there is a bunch of python logic for managing that controller, but a significant amount of work seems to have gone into performance optimization. I am curious how they prevent the Android GC from causing problematically long pauses, though.

commaai · on July 14, 2018

There's no Android GC, there's a Java GC. Python on Android isn't written in Java, it's CPython, written in C.

Python has a GC as well, but we turn it off for the control loop processes. https://github.com/commaai/openpilot/blob/devel/selfdrive/co...

Haters love to bring up the Python, but they never stick around long enough to explain exactly why it's a problem.

woolvalley · on July 14, 2018

Because it's not a statically typed language for the most part, which brings in an entire class of bugs of its own. It's also an extremely mutable language.

Great for scripts, but there is a reason why most large companies start bolting on types on whatever dynamic language they started with.

fulafel · on July 14, 2018

Most production languages that nominally do some static typing aren't statically very safe - certainly Java, Go, C/C++ are rife with runtime errors. After the required testing to eliminate those, it's not clear Python is significantly different.

Edit: also, Python does eagerly signal type errors, unlike say Javascript or C, so you don't get silently wrong answers. C is the default language in auto industry. .

Yeah, this is a bit of whataboutism, certainly it would be nice if the state of the art in production languages was closer to the ideal of statically verified... Haskell and Rust are in the right direction, and would be clearly superior in this regard

woolvalley · on July 14, 2018

I wasn't implying that static types solve everything, they just make things n+1 better and remove a class of bugs.

Statically typed languages are mature and you have no excuse not using them if your doing anything that approaching a need for reliability. Cars do, social cat pictures, not so much.

fulafel · on July 14, 2018

What about Erlang?

scintill76 · on July 14, 2018

Android is not typically a realtime OS. The root of the point, about uncontrolled pauses, is valid IMO.

naikrovek · on July 15, 2018

The safety-relevant code runs on a microcontroller in the CAN dongle and is written in C.

Why does everyone assume the phone is doing all the work?

orbifold · on July 14, 2018

Lack of type safety and static analysis tools. You can't apply formal verification to it. I wouldn't sit in a car in which any safety critical component was driven by python. Hopefully you would also not be able to get it certified for road use. I happen to know people that work on these problems for German automotive companies. This wouldn't fly there. I truly hope that solid engineering wins out over these approaches. That being said I admire the audacity.

ethanwillis · on July 14, 2018

Because German automotive companies are known for making systems that are robust and not just faking results.

rain1 · on July 14, 2018

I think it's great that your code is open source. The other self driving cars are all based on secret code that the public is not allowed to inspect or audit. This is extremely worrying.

Having your code open source means that outsiders will notice flaws in it, try not to "push back" too much against them. Sometimes they will point out serious flaws that allow you to improve your code significantly, sometimes they will point out non-issues or simply be wrong about things. So instead you should embrace it and take the time to consider the feedback. The crowd is a valuable resource that you have, that closed source projects don't.

wilsonnb2 · on July 14, 2018

> But the software that enables the semi-autonomous driving is free to download. Hotz says this allows him to sidestep the regulatory issue, though it’s unclear whether NHTSA would agree. “We aren’t selling any products that control a car,” he says. “We are giving away free software, and software is speech.” (A spokesperson for NHTSA did not respond to a request for comment.)

Sounds like they're just using open source to avoid liability and side step regulators. I love open source, but I do not think it's being used for benevolent reasons here.

rain1 · on July 14, 2018

Yow! Shows how naive I am. Thanks for the reply..

bitmapbrother · on July 14, 2018

The Android ART GC doesn't cause "long pauses".

https://www.youtube.com/watch?v=iFE2Utbv1Oo

sametmax · on July 14, 2018

As much as I love Python, you need realtime processing for this kind of things. You can't have a slow language with gc pauses and no way to guaranty execution time for a given operation.

amelius · on July 14, 2018

You should also mention the GIL here (Global Interpreter Lock).

https://realpython.com/python-gil/

icebraining · on July 14, 2018

CPython spent actually need the GC. If you make sure you have no cycles, you can disable it, since the rest uses reference counting (which AFAIK is predictable).

tinus_hn · on July 14, 2018

If you want to replace the always attentive driver who never blinks their eyes?

blub · on July 14, 2018

It's not necessarily about speed of reaction, although that certainly plays a part.

Such software needs to react in real time though, if the task that's turning the steering wheel gets preempted in the middle of taking a curve on a cliff your self-driving car will become a self-flying car.

Such a system would be the equivalent of a driver that suddenly starts texting at all sorts of poorly chosen times.

sametmax · on July 14, 2018

It's more a matter of concurrency and blocking behavior. Gc blocks. A task you can't bound to an execution time blocks.

stevenwoo · on July 14, 2018

In the 1980's the backup flight control software for landing the space shuttle ran on a HP-41 calculator, though they never had to use it AFAIK, with the multiple redundant onboard computers.

blub · on July 14, 2018

That's not what I found online: the devices running customized SW were apparently used as personal calculators by the crew and they would have also used them for manual calculation in case their flight computer had a problem.

They weren't connected to other shuttle computers, were they?

stevenwoo · on July 14, 2018

I found multiple stories/blogs with recollections like this one from Smithsonian including the use of calculators on early missions before laptops were used:

https://airandspace.si.edu/collection-objects/calculator-han...

I remembered reading it in the newspapers at the time which would have probably make the reports generally pre internet (or pre newspapers on the internet) in the 1980's. Of course I could be remembering it wrong or it could have been a HP calculator ad.

srcmap · on July 14, 2018

Love to study those projects in more details - Not just the code, but also the design, testing, validation processes.

sebastianavina · on July 14, 2018

What? thats enough redundancy for a space station

BugsJustFindMe · on July 14, 2018

It sounds to me like you don't understand how little performance you actually need for this kind of stuff. People have been writing effective autopilot software for decades using hardware that wasn't even particularly fast at the time and may as well be a pile of sand today. What exactly do you think that that code needs to do? I promise that Python on a new smartphone can do way more now than we could do 20 years ago on a Pentium in C or C++. You're lucky if any of your sensors emit at even 100Hz.

Jack000 · on July 14, 2018

you can't do timing sensitive stuff without an RTOS, because resources could be preempted by the os or other applications.

it used to be that certain chipsets could do motor control by bitbanging the parallel port, but modern pcs/phones can no longer do this due to latency.

hellllllllooo · on July 14, 2018

I think OP's point is about the latency from all the layers. At 70 mph, where 100ms is 3m, this is a problem for a control system.

commaai · on July 14, 2018

Human latency is about 250ms.

Jack000 · on July 14, 2018

I think that's not a fair comparison - a digital control system needs much finer time slices to compete with human reaction time. A control system that worked off of webcam images at 4hz would be a jittery mess, or just very slow if using a Kalman filter.

gugagore · on July 15, 2018

I agree it's not a fair comparison. Your analogy, though, confuses latency and throughput.