There's different classes of memory un-safety: buffer overflow, use after free, and double free being the main ones. We haven't seen a mainstream language capable of preventing use and free and double free without GC overhead until Rust. And that's because figuring out when an object is genuinely not in use anymore, at compile time, is a really hard problem. But a buffer overflow like from the article? That's just a matter of saving the length of the array alongside the pointer and doing a bounds check, which a compiler could easily insert if your language had a native array type. Pascal and its descendants have been doing that for decades.
> That's just a matter of saving the length of the array alongside the pointer and doing a bounds check, which a compiler could easily insert if your language had a native array type. Pascal and its descendants have been doing that for decades.
GCC has also had an optional bounds checking branch since 1995. [0]
GCC and Clang's sanitisation switches also support bounds checking, for the main branches, today, unless the sanitiser can't trace the origin or you're doing double-pointer arithmetic or further away from the source.
AddressSanitizer is also used by both Chrome & Firefox, and failed to catch this very simple buffer overflow from the article. It would have caught the bug, if the objects created were actually used and not just discarded by the testsuite.
> It would have caught the bug, if the objects created were actually used and not just discarded by the testsuite.
They were only testing with AddressSanitizer, not running the built binaries with it? Doing so is slow to say the least, but you can run programs normally with these runtime assertions.
It even has the added benefit of serving as a nice emulator for a much slower system.
> We haven't seen a mainstream language capable of preventing use and free and double free without GC overhead until Rust.
Sorry, that just isn’t the case. It is simple to design an allocator that can detect any double-free (by maintaining allocation metadata and checking it on free), and prevent any use-after-free (by just zeroing out the freed memory). (Doing so efficiently is another matter.) It’s not a language or GC issue at all.
> prevent any use-after-free (by just zeroing out the freed memory)
It's not quite that simple if you want to reuse that memory address.
Not reusing memory addresses is a definite option, but it won't work well on 32-bit (you can run out of address space). On 64-bit you may eventually hit limits as well (if you have many pages kept alive by small amounts of usage inside them).
It is however possible to make use-after-free type-safe at least, see e.g. Type-After-Type,
Type safety removes most of the risk of use-after-free (it becomes equivalent to the indexes-in-an-array pattern: you can use the wrong index and look at "freed" data but you can't view a raw pointer or corrupt one.). That's in return for something like 10% overhead, so it is a tradeoff, of course.
Rust is a definite improvement on the state of the art in this area.
One of the things I like about Zig is that it takes the memory allocator as a kind of “you will supply the correct model for this for your needs/architecture” as a first principle, and then gives you tooling to provide guarantees downstream. You’re not stuck with any assumptions about malloc like you might be with C libs.
On the one hand, you might need to care more about what allocators you use for a given use case. On the other hand, you can make the allocator “conform” to a set of constrictions, and as long as it conforms at `comptime`, you can make guarantees downstream to any libraries, with the a sort of fundamental “dependency injection” effect that flows through your code at compile time.
It can be memory safe. It's up to you to choose memory safety or not. That's a piece of what I was getting at. Unless I misunderstand something. I've only dabbled with Zig.
I'm not aware of any memory safety that works in Zig, other than maybe Boehm GC. The GeneralPurposeAllocator quarantines memory forever, which is too wasteful to work in practice as one allocation is sufficient to keep an entire 4kB page alive.
I mean, if you don't care about efficiency, then you don't need any fancy mitigations: just use Boehm GC and call it a day. Performance is the reason why nobody does that.
Zeroing out freed memory in no way prevents UAFs. Consider what happens if the memory which was freed was recycled for a new allocation?
Maybe an example will help make it clearer? This is in pseudo-C++.
struct AttackerChosenColor {
size_t foreground_color;
size_t background_color;
};
struct Array {
size_t length;
size_t *items;
};
int main() {
// A program creates an array, uses it, frees it, but accidentally forgets that it's been freed and keeps using it anyway. Mistakes happen. This sort of thing happens all of the time in large programs.
struct Array *array = new Array();
...
free(array); // Imagine the allocation is zeroed here like you said. The array length is 0 and the pointer to the first item is 0.
...
struct AttackerChosenColor *attacker = new AttackerChosenColor();
// The allocator can reuse the memory previously used for array and return it to the attacker. Getting this to happen reliably is sometimes tricky, but it can be done.
// The attacker chooses the foreground color. They choose a color value which is also the value of SIZE_T_MAX.
// The foreground_color *overlaps\* with the array's length, so when we change the foreground color we also change the array's size.
attacker->foreground_color = SIZE_T_MAX;
// The background_color overlaps with the array's size, so when we change the background color we also change the array's start.
// The attacker chooses the background color. They choose a color value which is 0.
attacker->background_color = 0;
// Now say the attacker is able to reuse the dangling/stale pointer.
// Say that they can write a value which they want to wherever they want in the array. This is
// Like you suggested it was zeroed when it was freed, but now it's been recycled as a color pair and filled in with values of the attacker's choosing.
// Now the attacker can write whatever value they want wherever they want in memory. They can change return addresses, stack values, secret cookies, whatever they need to change to take control of the program. They win.
if (attacker_chosen_index < array->length) {
array->items[attacker_chosen_index] = attacker_chosen_value;
}
}
The trick is not that the language support a safe approach (C++ has smart pointers / "safe" code in various libraries) in my view but simply that you CAN'T cause a problem even being an idiot.
As far as I know, nothing in the C/C++ standard precludes fat pointers with bounds checking. Trying to access outside the bounds of an array is merely undefined behavior, so it would conform to the spec to simply throw an error in such situations.
There's address-sanitizer, although for ABI compatibility the bound is stored in shadow memory, not alongside the pointer. It is very expensive though. A C implementation with fat pointer would probably have significantly lower overhead, but breaking compatibility is a non-starter. And you still need to deal with use-after-free.
> But a buffer overflow like from the article? That's just a matter of saving the length of the array alongside the pointer and doing a bounds check, which a compiler could easily insert
Both the array length and the index can be computed at runtime based on arbitrary computation/input, in which case doing bounds checks at compile time is impossible.