My friend at SpaceX told me (rough memory from a chat) that they almost made the flight control software for Starship in Rust but at the last minute someone got cold feet and they went with C or C++.
Yeah, MISRA C [1] is a set of rules for writing C for safety-critical environments, originally targeting the automotive environment. If you're used to vanilla C, it can feel very constraining!
In the Rust world, there's the Ferrocene project [2], which aims to provide a similar kind of safety-critical level of functionality.
In the sense they're talking about, they don't mean "formal proof," they mean "existence proof." It's not like it's a mature choice in the space, is what they mean.
> we did run Rust scheduling algorithms on the ISS
Actually on the ISS? It's been a while but if I remember right e.g. SACE (EUROPA) planning for the solar arrays happens down here. Transmission time is a fraction of a second, so there was no compelling reason to do the planning station-side. I'm only familiar with that use case, though.
The report gets it wrong. C and C++ can both be made memory safe with small changes. The cost of doing that is likely to be lower than the cost of either deploying CHERI or rewriting in Rust. And, the protections are likely to be stronger than what CHERI offers (CHERI tries really hard to just let existing C code do whatever the heck it does).
There's a ton of literature on ways to make C/C++ safe. I think that the only reason why that path isn't being explored more is that it's the "less fun" option - it doesn't involve blue sky thoughts about new hardware or new languages.
I think what you’re doing with Fil-C is cool, but I wouldn’t call a 200x slowdown a “small change.”
One of the interesting things that Rust has demonstrated is that you don’t have to choose between performance and safety and, in fact, that safety improvements in languages can actually result in faster programs (e.g. due to improved alias analysis). New technology/sexiness advantage aside, I think this is a significant driver of adoption.
> I think what you’re doing with Fil-C is cool, but I wouldn’t call a 200x slowdown a “small change.”
If you're bringing up the 200x, then you don't get what's going on.
It's extremely useful right now to have a compiler that's substantially correct so I don't have to deal with miscompiles as I grow the corpus.
Once I have a large enough corpus of tests, then I'll start optimizing. Writing compiler optimizations incrementally on top of a totally reliable compiler is just sensible engineering practice.
So, if you think that 200x is meaningful, then it's because you don't know how language/compiler development works, you haven't read my manifesto, and you have no idea where the 200x is coming from (hint: almost all optimizations are turned off for now so I have a reliable compiler to grow a corpus with).
> One of the interesting things that Rust has demonstrated is that you don’t have to choose between performance and safety and, in fact, that safety improvements in languages can actually result in faster programs (e.g. due to improved alias analysis). New technology/sexiness advantage aside, I think this is a significant driver of adoption.
You have to rewrite your code to use Rust. You don't have to rewrite your code to use Fil-C. So, Rust costs more, period. And it costs more in exactly the kind of way that cannot be fixed. Fil-C's perf can be fixed. The fact that Rust requires rewriting your code cannot be fixed.
We can worry about making Fil-C fast once there's a corpus of stuff that runs on it. Until then, saying speed is a shortcoming of Fil-C is an utterly disingenuous argument. I can't take you seriously if you're making that argument.
> So, if you think that 200x is meaningful, then it's because you don't know how language/compiler development works, you haven't read my manifesto, and you have no idea where the 200x is coming from (hint: almost all optimizations are turned off for now so I have a reliable compiler to grow a corpus with).
I actually did, the first day you made it public. A friend also sent it to me because you link my blog in it. Again, I think it's cool, and I'm going to keep following your progress, because I think Rust alone is not a panacea.
I've worked on and in LLVM for about 5 years now (and I've contributed to a handful of programming languages and runtimes over the past decade), so I feel comfortable saying that I know a bit about how compilers and language development work. Not enough to say that I'm an infallible expert, but enough to know that it's very hard to claw back performance when doing the kinds of things you're doing (isoheaps, caps). Isotyped heaps, in particular, are a huge pessimization on top of ordinary heap allocation, especially when you get into codebases with more than a few hundred unique types[1].
To be clear: I don't think performance is a sufficient reason to not do memory safety. I've previously advocated for people running sanitizer-instrumented binaries in production, because the performance hit is often acceptable. But again: Rust gets you both performance and safety, and is increasingly the choice for shops that are looking to migrate off of their legacy codebases anyways. It's also easier to justify training a junior engineer to write safe code that can be integrated into a pre-existing codebase.
> You don't have to rewrite your code to use Fil-C.
If I read correctly, you provide an example of an enum below that needs to be rewritten for Fil-C. That's probably an acceptable tradeoff in many codebases, but it sounds like there are well-formed C programs that Fil-C currently rejects.
> I've worked on and in LLVM for about 5 years now (and I've contributed to a handful of programming languages and runtimes over the past decade), so I feel comfortable saying that I know a bit about how compilers and language development work. Not enough to say that I'm an infallible expert, but enough to know that it's very hard to claw back performance when doing the kinds of things you're doing (isoheaps, caps). Isotyped heaps, in particular, are a huge pessimization on top of ordinary heap allocation, especially when you get into codebases with more than a few hundred unique types[1].
Isoheaps suck a lot more in kernel than they do in user. I don't think it's accurate to say that isoheaps are a "huge pessimization". It's not huge, that's for sure.
For sure, right now, memory usage of Fil-C is just not an issue. The cost of isoheaps is not an issue.
Also, Fil-C is engineered to allow GC, and I haven't made the switch because there are some good reasons not to do it. That's an example of something where I want to pick based on data. I'll pick GC or not depending on what performs better and is most ergonomic for folks, and that's the kind of choice best made after I have a massive corpus.
> If I read correctly, you provide an example of an enum below that needs to be rewritten for Fil-C. That's probably an acceptable tradeoff in many codebases, but it sounds like there are well-formed C programs that Fil-C currently rejects.
Yeah but it's not a rewrite.
If you want to switch to Rust, it's not a matter of changing a union - it's changing everything.
If you want to switch to Fil-C, then yeah, some of your unions, and most of your mallocs, will change.
For example, it took about two-three weeks working about 2hrs/day to convert OpenSSH to the point where the client works. I don't think you'd be able to rewrite OpenSSH in Rust on that kind of schedule.
Hi pizlonator, I'm working on a solution with similar goals (I think), but a bit of a different approach. It's a tool that auto-translates[1] (reasonable) C code to a memory-safe subset of C++. The goal is to get it reliable enough that it can be simply inserted as an (optional) build step, so that the source code can be maintained in its original form.
I'm under the impression that you're more of a low-level/compiler person, but I suggest that a higher level language like (a memory-safe subset of) C++ actually makes for a more desirable "intermediate representation" language, as it's amenable to maintaining information about the "intent" of the code, which can be helpful for optimization. It also allows programmers to provide manually optimized memory-safe implementations for performance-critical parts of the code.
The memory-safe subset of C++ is somewhat analogous to Rust's in terms of performance and in that it depends on a non-trivial static checker, but it imposes less onerous restrictions than Rust on single-threaded code.
The auto-translation tool already does the non-trivial (optimization) task of determining whether any (raw) pointer is being used as an array iterator or not. But further work to make the resulting code more performance optimal is needed. The task of optimizing a high-level "intermediate representation" language like (memory-safe) C++ is roughly analogous to optimizing lower-level IR languages, but the results should be more effective because you have more information about the original code, right?
I think this project could greatly benefit from the kind of effort you've displayed in yours.
My plan for Fil-C is to introduce stricter types as an optionally available thing while preserving the property that it's fast to convert C code to Fil-C.
C++ is easiest to describe, at the guts, in terms of C-style reasoning about pointers. So, the easiest path to convincingly make C++ safe is to convincingly make C safe first, and then implement the C++ stuff around that. It works out that way in the guts of clang/llvm, since my missing C++ support is largely about (a) some missing jank and glue in the frontend that isn't even that interesting and (b) missing llvm IR ops in the FilPizlonatorPass.
> the easiest path to convincingly make C++ safe is to convincingly make C safe first
Yeah, with all the static analysis, I did end up straying from the easy path. Ugh :) But actually, one thing that C++ provides that I found made things easier is destructors. I mean, I provide a couple of raw pointer replacement types that rely on ("transparently wrapped") target objects checking for any (replacement) pointers still targeting them when they get destroyed.
As you indicated in another comment, you explicitly choose to expose/require zalloc() because you didn't want to make malloc() too "magical" (by hiding the indirect type deduction). In that vein, one maybe nice thing about the "safe C++ subset" solution is that it exposes the entirety of the run-time safety mechanisms, in the sense that it's all in the library code and you can even step through it in the debugger. (It also gives you the option to catch any exceptions thrown by said safety mechanisms. You know, if exceptions are your thing. Otherwise you can provide your own custom "fault handling" code (if you want to log the error, or dump the stack or whatever).)
> There's a ton of literature on ways to make C/C++ safe. I think that the only reason why that path isn't being explored more is that it's the "less fun" option - it doesn't involve blue sky thoughts about new hardware or new languages.
I can't think of any other reason that makes sense either. Anyway, the first thing is to dispel the notion that C and C++ cannot be safe, and it seems like your project is likely to be the first to demonstrate it on some staple C libraries. I'm looking forward to it.
What kind of small changes? It seems strange to me that other languages would bother implementing complicated garbage collectors and borrow checkers if all you need is a small change from C.
Most of the changes are just using zalloc and friends instead of malloc and friends. If I reaaaallly wanted to, I could have made it automatic (like, `malloc(sizeof(Foo))` could be interpreted by the compiler as being just `zalloc(Foo, 1)` ... I didn't do that because I sorta think it's too magical and C programmers don't like too much magic).
I'm also having a hard time fully convincing myself of this:
> the allocator will return a pointer to memory that had always been exactly that type. Use-after-free does not lead to type confusion in Fil-C
In the worst case, this seems like you must simply never reallocate memory, or we're discarding parts of the type. If I successively allocate integer arrays of growing lengths, it seems to be it must either return memory that had previously been used with a different type (e.g., a int[5] and an int[3] occupying the same memory at disjoint times) or address space usage in such a program is quadratic, or we're not considering array length as "part of the type", i.e., we're discarding it. (I'm not sure if this is acceptable or not. I think that should be fine, but I'll have to think harder.)
It's fine because they're both pointers. This union is also fine:
union {
int x;
float y;
}
It's fine because Fil-C treats both members as "ints" in the underlying type system.
This union has to change:
union {
char* a;
int b;
}
You can turn it into a struct or you can move the `char*` member out of it.
> In the worst case, this seems like you must simply never reallocate memory, or we're discarding parts of the type. If I successively allocate integer arrays of growing lengths, it seems to be it must either return memory that had previously been used with a different type (e.g., a int[5] and an int[3] occupying the same memory at disjoint times) or address space usage in such a program is quadratic, or we're not considering array length as "part of the type", i.e., we're discarding it. (I'm not sure if this is acceptable or not. I think that should be fine, but I'll have to think harder.)
int[3] and int[5] are both integer typed but have different size. The allocator also uses size segregation. It so happens that the virtual memory used for int[3] will never be reused for int[5] for that reason.
There's no problem with this; it's just a more aggressive version of segregated allocation.
The allocator still returns physical pages when they go free. It's just the virtual memory that only gets reused in a way that conforms to type.
And, that part of the system is the most well-tested. The isoheap allocator has been shipping in WebKit for years.
I can type pun int & float pointers. (I think this is the same sort of behavior that I'm going to note at the end.)
> You can turn it into a struct or you can move the `char` member out of it.*
A struct is a product type. A union (at least one combined with a tag) is a sum type. I suppose one could use a struct like a union, and the members not corresponding to the variant are just wasted memory …
This is also a breakage from vanilla C, but it's not the first in your language, so I'm assuming that's acceptable. (And honestly sum types beyond `enum` are pretty rare in C, as C really doesn't help you.)
> [isoheaps]
Yeah, so I think your isoheaps are "memory-safe". I would stress that Rust's guarantees are a good bit stricter, and remove behavior you'd still see in your language. Your language wouldn't crash … but it would still go on to do undefined-ish things. (E.g., you'd get a integer, but it might not be the one you expect, or you might write to memory that is in use elsewhere, assuming the write was in-bounds within a current allocation.)
> I can type pun int & float pointers. (I think this is the same sort of behavior that I'm going to note at the end.)
Yes. And yes.
> A struct is a product type. A union (at least one combined with a tag) is a sum type. I suppose one could use a struct like a union, and the members not corresponding to the variant are just wasted memory …
Yup, wasted memory.
> This is also a breakage from vanilla C, but it's not the first in your language, so I'm assuming that's acceptable. (And honestly sum types beyond `enum` are pretty rare in C, as C really doesn't help you.)
Exactly. I'm fine with the kind of breakage where I don't have to rewrite someone's code to fix it. I'm fine with breakage that doesn't slow me down, basically. It's nuanced.
> Yeah, so I think your isoheaps are "memory-safe". I would stress that Rust's guarantees are a good bit stricter, and remove behavior you'd still see in your language. Your language wouldn't crash … but it would still go on to do undefined-ish things. (E.g., you'd get a integer, but it might not be the one you expect, or you might write to memory that is in use elsewhere, assuming the write was in-bounds within a current allocation.)
You're 100% right. It's a trade-off.
Here's the cool bit though: Fil-C has no `unsafe` and no other `unsafe`-like escape hatch. In other words, Fil-C has a lower bar than Rust, but that bar is much more straightforward to meet.
Attaching capabilities to pointers is sort of what CHERI does, isn't it? And the presumably CHERI can have better performance thanks to the direct hardware support. (Your manifesto mentions a 200x performance impact currently.)
CHERI's capabilities are more permissive. For example, if you use-after-free in CHERI, then you can access the "free" memory (or whatever ends up there after another malloc) without trapping regardless of what type ends up there.
Fil-C never allows pointer memory to allow primitive or vice-versa, and use-after-free means you're at least pointing at data of the same type. Also, Fil-C has a clear path to using concurrent GC and then not have the UaF problem at all, while CHERI has no path to concurrent GC (they can stop the world, kinda maybe).
It's not meaningful to conclude anything from Fil-C's current perf. In the limit, it's easier to make Fil-C fast than it is to make CHERI fast. For example, in CHERI, if you want to have a thin pointer then you have to throw safety out of the window. The Fil-C plan is to give you thin pointers provided that you opt into more static typing.
We don't have operating systems that limit possible side effects at the discretion of the user at run time. This is the root cause of computer insecurity in the US, and the rest of the world.
--- Background ---
If I owe you $5.00, I can hand you an $5.00 bill, and that's the maximum I can lose. It's all down to my discretion at the time of the transaction.
If I owe you $5.00, and can't use cash, then I have to put a bank account at risk, to make a transaction. There are rules about it, but the risks are far less constrained. The bank enforces my choices, most of the time.
If you run a DOS program on an IBM XT with only 2 floppy disks, the only thing it can alter is the contents of any non-write-protected disks that happen to be mounted. It's possible to completely protect the operating system, and run almost any software you want, because the side effects are limited at the discretion of the user.
“While formal methods have been studied for decades, their deployment remains limited; further innovation in approaches to make formal methods widely accessible is vital to accelerate their broad adoption.” — White House
Well gosh. I applied to US research programs with an interest in formal methods and was rejected absolutely across the board. Good luck to you.
This reminds me of a video I watched years ago with uncle Bob talking on a Clean Code talk mentioning that clean code was more than just a method to write code but also a strategy, a guideline if you will for when policymakers wake-up to the fact that the world is run with software and start making policy. It’d most certainly be better if those guidelines come from the industry itself as opposed to being imposed with little notion of how the industry works
America only seems to work on cash incentives, so they should make some hundred million dollars in grants or so for businesses to "invest" in rebuilding important infrastructure in memory safe languages.
I'm not convinced of the memory safe language hype train honestly. Every single application could be 100% bulletproof and we would still live in a world where the CFO downloads a random exe from a phishing email and runs it with admin privileges. That still seems to be the main way institutions and people are compromised. We make jokes about "you can't defend yourself from a nation-state actor" but half the time the primary infection point for everything from North Korea to 15 year old Kevin playing with MyFirstRansomeware is to just put it in an email and let people do the work for you, or copy a log in page and let the user give you their credentials.
Edit: lol people very very very angry that I dared to say "hey maybe rust ain't gonna save our world".
Explain to me why you need a language that is pervasively memory unsafe.
And be sure you don't cite reasons why you need unsafe callouts to unsafe capabilities. Nobody denies that.
Explain why you need a language that is pervasively unsafe, unsafe by default. Explain to me what code you are writing that has to routinely access memory outside of arrays. Explain to me what code you are writing that simply must be able to use pointers to access memory in unsafe ways.
In a nutshell, explain what these amazing benefits are to memory-unsafe languages that somehow offset the many and manifold costs we've found over years of usage.
Memory safety in a modern language is not only virtually free, it's a positive benefit. I simple do not see an argument for memory unsafe languages, especially given that every single one of them ship with "unsafe" capabilities available on demand and fully documented.
(Also recall "memory safe" does not imply Rust. Rust is a particularly strong language on that front, sure, but memory safety is basically every currently-used language except C and C++. People advocating for memory safety are not advocating for everything to be moved to Haskell or Rust.)
So why? Why should we keep using memory unsafe languages? It's all cost and no benefit. Who wants that?
(The only practical answer is "I've got a system here written in C/C++", and I concede the engineering pressures that can keep you there. But my personal opinion at this point is that greenfielding a project in C or C++ without adjoining a strong code analysis on day one is rapidly approaching, if not already reaching, professional negligence. And I've got my stink eye on C or C++ with strong code analysis.)
> I'm not convinced of the memory safe language hype train honestly. Every single application could be 100% bulletproof and we would still live in a world where the CFO downloads a random exe from a phishing email and runs it with admin privileges.
Well, if the CFO has admin you’ve already failed several audits but in general, yes, there are multiple categories of risk which you have to protect against. That doesn’t mean it’s not valuable to reduce one of them, however – in addition to the direct risk reduction, it also frees up time to work on the others.
You have to remember that most businesses don't perform any kind of security auditing whatsoever. Any non-tech business with 50 or less people has gaping security holes. I say that with absolute certainty.
I saw someone running win98se 5 years ago, the box seething, writhing in pain with highly apparent malware, as a controller for industry grade, niche market machinery. It was network connected for "the IT company to log in and fix things", had software that would only work in win98se, and because the machine was "in the room", its browser was used because "convenient".
No one cared. This is normalcy for most people.
Even if the "it only works on bare metal" thing was true, and its hardware key device may make that a reality, there's zero reason to let people browse on it. Or leave it online 24x7, one could bring a switch port up/down when needed.
Yep, especially in manufacturing. They still run windows XP machines because you can't upgrade the OS due to vendor specific hacks. It's either run XP or fork out a million dollars for an OS upgrade because they would need to replace the whole machine. I've supported many businesses like this.
True, but if the CFO is running with admin privileges how likely is it that they are otherwise enough on the ball security-wise to be writing or even making decisions about code in an unsafe language? The whole point of safety standards is to avoid everyone needing to be an expert in everything: I can look for a UL label and know that someone else has taken care of the basics, even though our CFO can still do something crazy like leaving his space heater running under a pile of newspaper. That doesn’t mean having the other safeguards wasn’t a good idea because they don’t solve every problem.
So bored of these "we don't need to fix this because of something else" arguments. And if the article was about the other thing, then we'd hear the opposite argument.
Look - on the level of coding vulnerabilities, memory safety is by far the biggest problem. You want to talk about phishing attacks and privilege management, fine. There's a bunch of different controls for those.
We would never address anything following the path of what-aboutism.
I think a lot of people here forgot that most businesses aren't huge corporations or tech oriented. Most businesses probably have less than 50 people and non-professional "IT Support" ie, the owners grandson who is "good with computers".
The amount of door wide open security stances I've seen working with small to mid-size businesses is like 75%. The difficulty of prying local admin away from the owner of a concrete company is quite high. It's all very ephemeral and hard to explain to non-tech savvy people.
Turning on MFA for everything possible, aggressive AV, social engineering/phishing training, and removing local admin daily drivers probably solves 99.99% of malware.
Memory safe software would be great, yes, but it really isn't the problem.
If the owner’s grandson is doing the IT they’re unlikely to be able to write or audit unsafe code, either. That means they need to be able to trust that their suppliers are using safe primitives. Memory safety is like requiring their electrician use use circuit breakers - no, it won’t protect every source of fire but we have codes specifically to avoid needing everyone to have expertise.
Usually that CFO doesn't have admin privileges. However, the exe he ran could very easily make use of a privilege escalation exploit on a service that does run with admin privileges. An exploit that is a buffer overflow or otherwise an exploit that is possible due to memory safety issues.
Or that exe tries to connect to other services in the network to exploit a buffer overflow on another system. An example of such an exploit was EternalBlue.
So yes, you're probably right that from a purely external perspective, attackers are unlikely to gain initial access using exploits targeting memory safety. However, once they are in, there are all sorts of memory safety bugs that could be used.
Instead of a subsidy why not just make companies ineligible for security breach liability insurance if they continue to use a memory unsafe language. It'd be a quite trivial change and we'd see an immediate shift from the largest players with tooling improvements in newer languages and momentum causing smaller companies to follow suit.
I'm very surprised to see that Ada was not mentioned in this article despite the fact that they brought up rust noting that it hasn't been proven in space while Ada actually has.
They already had a incredibly readable memory safe language since the 80s.
My first job was in Ada and by then our customer put a clause allowing C to be introduced where it Ada was difficult to introduced. In our case the Issues with Ada was the poor support of the Ada compiler vendor. The kind of granularity on type definitions in Ada is superb. The type checking was slow. A type can be defined with the number of bits and the range during declaration which makes compilation slow.
This makes no sense because the legislature sets and spends budgets, while this a document from the ONCD. Why are you pretending "the government" has agency or manages anything.
Address the advice instead of posting low effort hate.
Do you apply the same reasoning when government tells you something else that is obviously true and good advise, e.g. to wash your hands after you used a restroom?
The target audience is not you. It seems to be aimed mostly at C-level execs with less technical knowhow which have less than a CS undergrad's understanding of these issues.
Sure, but we did run Rust scheduling algorithms on the ISS when I was at JPL. It was a proof of concept, not an actual system, though.