This is awesome! Really great write-up, and solid work by Jessie :^)
The Ladybird codebase is generally very defensive, but like every browser, our JavaScript engine is slightly less so (in the pursuit of performance.)
There are architectural lessons to learn here beyond just fixing the bugs found. We've since replaced these allocations (+ related ones) with callee-specific stack memory instead of trying to be clever with heap allocation reuse.
We're also migrating more and more of our memory management to garbage collection, which sidesteps a lot of the traditional C++ memory issues.
As others have mentioned, sandboxing & site isolation will make renderer exploitation a lot less powerful than what's demonstrated here. Even so, we obviously want to avoid it as much as possible!
This particular memory vulnerability, as I understand it, was a result of a `ReadonlySpan<>` targeting a resizable vector. A simple technique used by the scpptool-enforced safe subset of C++ to address this situation is to temporarily move the contents of the resizable vector into a non-resizable vector [1] and target the span at the non-resizable vector instead.
Upon destruction, the non-resizable vector will automatically return the contents back to the original resizable vector. (It's somewhat analogous to borrowing a slice in Rust.)
While it wouldn't necessarily prevent you from doing the flawed/buggy thing you were trying to do, it would prevent it from resulting in a memory vulnerability.
Whatever happens, large parts of the codebase + dependencies will be C++ (or C) for the foreseeable future.
We're working on integrating with Swift, but despite the team's earnest efforts, Swift/C++ interop is still young and unstable.
On a personal note, I'm increasingly feeling like "C++ with a garbage collector" might actually be a reasonable tool for the task at hand. Watching the development of Fil-C in this space..
What'd be the effect of Swift be on the possibility of a Windows port? I know anything end user friendly is ages away, but I don't live in Apple land, and neither does most of the world. Apple has a monopoly on iOS and huge market share on Mac, and is still at 20% or something.
The core Swift Lang has is being made more independent of Apple, and can be compiled for an increasing number of platforms thanks to the LLVM-based compiler
I'm honestly not at all familiar with browsers but I really do wonder if a custom language wouldn't be a reasonable tradeoff. It's not all that insane as that is a path that has been walked before. For instance FoundationDB has their own syntax to manage their actor system which just transpiles to C++: https://github.com/apple/foundationdb/blob/main/flow/README....
V8 also has torque which I think to some degree also fits into that type of mindset.
Out of curiosity, why not C# at this point? It's pretty hard to marry C++ with a high-performant garbage collector, since underlying language semantics does not allow for e.g. compacting GCs.
“ Ladybird started as a component of the SerenityOS hobby project, which only allows C++. The choice of language was not so much a technical decision, but more one of personal convenience. Andreas was most comfortable with C++ when creating SerenityOS, and now we have almost half a million lines of modern C++ to maintain.
However, now that Ladybird has forked and become its own independent project, all constraints previously imposed by SerenityOS are no longer in effect.
We have evaluated a number of alternatives, and will begin incremental adoption of Swift as a successor language, once Swift version 6 is released.”
I've only used it within XCode so can't say which is to blame, but I did frequently get the message "The compiler is unable to type-check this expression in reasonable time; try breaking up the expression into distinct sub-expressions" which seems to be a problem with the language/compiler/type system, not XCode per se
I agree with this, i've been avoiding xcode as much as possible for my little swift projects, but now I do wonder if that still stands for large codebases, I guess you could try and find some big codebases on swift on gh and see how much it takes to compile
Reentrancy bugs like this one are surprisingly common. Having reviewed lots of unsafe Rust code, unnoticed calls into outside code (that can then reenter your own code or modify your data structures, blowing everything up) is one of the most common soundness issues I've found across different projects.
The main solutions seem to be either restricting how possibly-invalidated data can be held (e.g., safe references in Rust), or having some coloring scheme (e.g., "pure" annotations) to ensure that the functions you call are unable to affect your data. Immutable languages can mitigate it somewhat, but only if you have the discipline to maintain a single source of truth for everything, and avoid operating on stale copies.
Any reasonably sophisticated web browser is going to require a decent amount of unsafe {} if only just for performance reasons. Obviously would be much easier to audit though.
Eh. It will work with your code but at some point your dependencies will have to dive into unsafe (e.g. calling C libs/kernel, SIMD, ASM by hand, etc.).
Minimize unsafe, auditing libs with Geiger, and minimizing outside dependencies to a few reliable vendors, is what is practically needed.
If this is all-new development, wouldn't it be good for the emphasis to be on correctness and security, as part of the design and coding itself?
That's something that you use fuzzing as one way to detect a failure of, not as the means of achieving correctness and security.
I'm not picking on Ladybird here specifically. Chrome and Firefox provide constant streams of security vulnerabilities. But it would be nice if Ladybird didn't start with the same problems that might be attributed to huge legacy code bases.
tbh i kinda love how they're just going for it and building from scratch but i always wonder how much focus on security upfront actually changes things long-term-you think building with fun in mind ends up missing critical stuff or does it keep devs more engaged
Even in a modern browser, a renderer exploit (the most sandboxed portion of the browser) gives you access to a large attack surface - the browser process via IPC, the kernel via syscalls, and loads of data from other websites.
So no, an exploit like this is not just “of academic value” even in a sandboxed browser.
With decades and decades of memory safety lessons in the books, it's hard to imagine how C++ was the language of choice when starting new browser from scratch in 2018.
The browser was not started with the idea of taking over the main focus of development, it was just another part of an already pretty large hobby OS project
Fine. With decades and decades of memory safety lessons in the books, it's hard to imagine how C++ was the language of choice when starting new operating system from scratch in 2018.
Their GitHub has 0,3% Swift code. They said they start once Swift 6 is out. It has been out for months. So either they abandoned Swift or haven’t really started or they are really really slow to start using it. All three options are against the article being outdated, wouldn’t you agree?
Current blockers to swift usage are found here: https://github.com/LadybirdBrowser/ladybird/issues/933
Rising tide lifts all boats, by trying to use Swift seriously, they're finding and helping fix bugs in the compiler
Because the article is from 2022 and says that they will use a custom language called Jakt which didn't pan out, it seems. Yes, I am also eager for the Swift rewrite to get off the ground.
When they started, the plan was mostly to have fun and see how far you can get when creating an OS from scratch. So picking a language in which they are experienced makes sense in that context.
One would think the same of C, where exploits trace all the way back to Morris worm in 1988, that is 36 years of thinking the problem are the developers, not the language, with new projects being started every day still.
At least C++ has mechanisms to write safer code, provided one makes use of them, even if still there are issues.
To use a modern example renaming the JavaScript file extension to a Typescript one, only gets you so far.
Then one can make use of Typescript's type system, or switch to Elm to the next level.
Always good to start the discussion but the article doesn't seems to link to an issue on the Ladybird github repo, which I would expect in the case of academic disclosure etc.
Obviously nobody is really using Ladybird yet and there will be many more such issues to address, so now is a good time to evaluate how to avoid such mistakes up front.
The Ladybird codebase is generally very defensive, but like every browser, our JavaScript engine is slightly less so (in the pursuit of performance.)
There are architectural lessons to learn here beyond just fixing the bugs found. We've since replaced these allocations (+ related ones) with callee-specific stack memory instead of trying to be clever with heap allocation reuse.
We're also migrating more and more of our memory management to garbage collection, which sidesteps a lot of the traditional C++ memory issues.
As others have mentioned, sandboxing & site isolation will make renderer exploitation a lot less powerful than what's demonstrated here. Even so, we obviously want to avoid it as much as possible!