Checked integer arithmetic in the prospect of C23

raphlinus · on Dec 19, 2022

I posted overflow checking of signed integer arithmetic as a puzzle yesterday[1]. I got some good responses but none quite as minimal wrt number of instructions as my own solution:

    bool add_will_overflow(int32_t a, int32_t b) {
        uint32_t c = (uint32_t)a + (uint32_t)b;
        return (((uint32_t)a ^ c) & ((uint32_t)b ^ c)) >> 31;
    }

That produces the following assembly (see Godbolt[2]):

        lea     edx, [rdi+rsi]
        mov     eax, edi
        xor     eax, edx
        xor     esi, edx
        and     eax, esi
        shr     eax, 31
        ret

In Rust, you can write a.checked_add(b).is_none() which produces the following assembly[3]:

        add     edi, esi
        seto    al
        ret

A fun fact about this code: the overflow flag which is set by the add instruction and then harvested dates back at least to the 8080 (almost 50 years ago) and is not present in vanilla ARM. However, Apple Silicon has it as an extension, to make life easier for Rosetta 2 binary translation[4]. So when you do get to use this shorter code sequence, be thankful of the effort that chip designers put in to make it execute efficiently.

I expect the C23 built-in functions will perform as well as Rust here, which is a win both for ergonomics (you can't really consider the current state of "will a+b overflow" to be discoverable) and performance.

[1]: https://mastodon.online/@raph/109535617953722719

[2]: https://godbolt.org/z/17zMsWjYv

[3]: https://rust.godbolt.org/z/36Ta9oP1P

[4]: https://news.ycombinator.com/item?id=33635720

jcranmer · on Dec 19, 2022

Checked overflow operations are kind of the goto operation for "it's easy in assembly, hard in programming languages"--in hardware terms, it's usually check a flag, but since flag registers are not provided for in high-level languages, it becomes a game of try to write it in a pattern that the compiler can recognize, which is never a fun game to play. Even worse than addition is multiplication. Thankfully, C23 has finally added these operations.

Although, recently, I noticed I wanted a case where I wanted checked (u32 - u32) -> i32 and (u32 + i32) -> u32 operations, which even Rust's standard library doesn't provide. (The use case is keeping track of a running delta between two lists of u32 values--the delta can go positive or negative, so it has to be signed, but the values in the lists can never be negative).

TazeTSchnitzel · on Dec 19, 2022

> it becomes a game of try to write it in a pattern that the compiler can recognize

Worse, in C or C++ you also need to find a way to do it without undefined behaviour. You can't just do the operation and see if the result matches expectations…

raphlinus · on Dec 19, 2022

The addition operator just landed in Rust 1.66 (checked_add_signed[1]), but the subtraction one it looks like you'd need to roll your own.

[1]: https://github.com/rust-lang/rust/issues/87840

tialaramex · on Dec 20, 2022

You are probably looking at it wrong. You can now write

  i32::checked_add_unsigned(some_u32)  and

  i32::checked_sub_unsigned(some_u32)

... which I think are exactly what your parent needs.

jcranmer · on Dec 20, 2022

u32::checked_add_signed solves one of the pairs (u32 + i32 -> u32), but there's nothing for the other one (u32 - u32 -> i32).

tialaramex · on Dec 20, 2022

Good point, I'm sure this made sense to me when I wrote it, but I can't ask past me why.

tinus_hn · on Dec 19, 2022

C is supposed to be reasonably portable so it doesn’t make too much sense to depend on flags that are not available on all platforms.

derefr · on Dec 20, 2022

Why can’t it be a flag on some platforms and complex hereafter branching logic on other platforms? Like the division shim on platforms that don’t have a division instruction?

int_19h · on Dec 20, 2022

It's easier to find a platform that doesn't support floating-point numbers than one that doesn't have some kind of overflow flag. Some kind of platform-independent abstraction is long (as in, decades) overdue.

noobermin · on Dec 20, 2022

Sure, but it kind of is insane that there are platforms (like ARM apparently, did not know) that do NOT have a carry flag.

dooglius · on Dec 19, 2022

This is already present as a builtin in in GNU C (as indicated in TFA) and it already results in the optimal code: https://godbolt.org/z/qc4zvav7E

rightbyte · on Dec 19, 2022

"__builtin_add_overflow" in gcc produces the same output as "checked_add()".

I really hope stuff like this is added to the standard.

dezgeg · on Dec 19, 2022

Hmm, isn't the Apple-specific magic only for parity(PF) and aux carry (AF)? aarch64 does have a 'V' flag for signed overflow.

raphlinus · on Dec 19, 2022

Oops, you're right. Too late to edit, sorry about the confusion.

nullc · on Dec 20, 2022

How does it actually perform? by default I generally assume flags registers are kinda dicey for performance due to the dependency.

tinglymintyfrsh · on Dec 19, 2022

Also, Hacker's Delight and OpenBSD probably have clever solutions for these.

wahern · on Dec 20, 2022

Surprisingly, OpenBSD does not have a library (neither a public API nor even just routines which are copied project-to-project as is common with OpenBSD utilities and daemons) to handle arithmetic overflow. The closest might be malloc/realloc extensions, like reallocarray, that handle common scenarios where arithmetic overflow is seen.

Someone · on Dec 19, 2022

FTA: “Their working is quite simple: the arithmetic is as if performed in the set of mathematical integers and then the value is written to result. If it fits, the return value is false. If it doesn’t fit, the return value is true”*

They also give example code

  bool add_invalid = ckd_add(&result_add, a, b);

I can see that fits with “most of the time, anything positive means ‘no error’”, for example in malloc, write, read or printf, but these new functions return bool, not int, and the chosen method will require writing a double negation sometimes:

  if(!add_invalid) { … }

That’s not too bad, but if I were to see

  if(!ckd_add(&result_add, a, b)) { … }

I would expect that to test for failure, not success.

Because of that, I think I would have chosen to return true on success, false on failure. I’m curious as to what arguments led to the choice made.

chongli · on Dec 19, 2022

In C, the value 0 is equivalent to false and all nonzero values are equivalent to true. It’s a convention throughout the C standard library to return 0 on success and nonzero when some error occurred. The behaviour of the new checked arithmetic library is consistent with that convention.

Someone · on Dec 19, 2022

> It’s a convention throughout the C standard library to return 0 on success and nonzero when some error occurred.

If only it were so simple. read and write, for example, return a number less than zero on error and a non-negative number on success, and malloc returns zero on error, and nonzero on success.

The general rule for early C seems to be “whatever’s the best way to cram a return value or an error in an int” (probably the correct decision for the time)

Also, these new functions return a bool, which, in C23, gets integer-converted to zero for false and one for true (https://en.cppreference.com/w/c/language/bool_constant. C17 had macros for true and false, with false being zero)

and the reverse, converting to bool similarly has zero fro false (https://en.cppreference.com/w/c/language/conversion#Boolean_...):

“A value of any scalar type (including nullptr_t) (since C23) can be implicitly converted to _Bool. The values that compare equal to an integer constant expression of value zero are converted to 0 (false), all other values are converted to 1 (true).”

(https://en.cppreference.com/w/c/language/bool_constant)

dahfizz · on Dec 19, 2022

_when the return value is an error code_, zero means success and nonzero means failure. Functions like `read`, `recv`, etc etc don't just return an error code. They return an actual value.

Functions that only return an error code like `stat`, `connect`, and the proposed ckd_add, return 0 on success and nonzero on error.

gustedt · on Dec 19, 2022

Unfortunately there are several error conventions in the C standard.

Here, the committee just standardized existing practice, namely the gcc builtins. We just adjusted the call sequence in putting the pointer parameter for the result first.

SAI_Peregrinus · on Dec 19, 2022

And in shell it's convention for programs to return 0 for success and nonzero when an error occurred. The issue is that in shell, the `true` builtin returns `0` and `false` returns `1`, which is the opposite of C's `bool`. And almost every other language's Boolean type.

dahfizz · on Dec 19, 2022

The proposal fits my intuition. When the return value is an error code, truthy values are an error.

    int rc = func();
    if( rc ) { /*handle error*/ }

examples from the stdlib: connect(), stat(), etc. Hell, even main is defined to return 0 on success, nonzero on error.

RustyRussell · on Dec 19, 2022

Not for bool though.

dooglius · on Dec 19, 2022

I find the most readable, unambiguous thing is to define explicit macros or constants, e.g.

  if(ckd_add(&result_add, a, b) == CKD_SUCCESS) { ... }

or alternatively,

  if(CKD_SUCCESS(ckd_add(&result_add, a, b))) { ... }

kevin_thibedeau · on Dec 19, 2022

The return value is the overflow condition so just name it CKD_OVERFLOW.

RustyRussell · on Dec 19, 2022

Yes, it's backwards. And counter-intuitive use of bool :(

They felt fine changing the argument order, why stick with the reverse polarity?

ash_gti · on Dec 19, 2022

I know most of these are compiler intrinsically but it’s good to have them standardized.

unwind · on Dec 19, 2022

Really fun, great introduction to the exciting new features.

Which part of the example implemented using `nullptr` instead of `NULL`, which is also from the future, though?

gustedt · on Dec 19, 2022

Yes, `nullptr` will also be in C23.

pwdisswordfish9 · on Dec 19, 2022

What for, if ((void *)0) is sufficient in C?

layer8 · on Dec 19, 2022

See the rationale here: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3039.htm#r...

saagarjha · on Dec 19, 2022

Apparently implementations don’t use this or something like that.

tinglymintyfrsh · on Dec 19, 2022

tl;dr

    #include <stdckdint.h>



    bool ckd_add(type1 *result, type2 a, type3 b);
    bool ckd_sub(type1 *result, type2 a, type3 b);
    bool ckd_mul(type1 *result, type2 a, type3 b);



    #include <stdckdint.h>
    #include <limits.h>

    /* ... */
    int x;
    int a = INT_MAX;
    int b = INT_MAX;

    if (!chk_add(&x, a, b)) {
       /* error! */
    }

Other stuff on the table for C23

- https://thephd.dev/c-the-improvements-june-september-virtual...

- (PDF) https://open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf

comex · on Dec 19, 2022

You have it backwards: it returns true on error.

heywhatupboys · on Dec 19, 2022

> if (!chk_add(&x, a, b)) { > /* error! */ > }

does it return non-zero on success???

tinglymintyfrsh · on Dec 19, 2022

They're using a bool, so it violates the old-school C paradigm used for library calls. This is more of a macro rather than a syscall or standard library function call.

comex · on Dec 19, 2022

It returns a bool, but true means error, not false.

ronsor · on Dec 20, 2022

It's not really an error. The function is just "check if this addition overflows" and tells you true, it does overflow, or false, it doesn't.

EdSchouten · on Dec 19, 2022

It returns a boolean.

RustyRussell · on Dec 19, 2022

Let's take this as evidence that the proposal is counter-intuitive?

jacquesm · on Dec 19, 2022

What was the rationale for not simply making it optional to throw an exception on overflow?

gustedt · on Dec 19, 2022

Besides C not having exceptions (that you could catch), the point was and is to have a way such that such a call has defined behaviour under any circumstances. The return value of the functions can even be ignored if the wrap around of the overflowing value is what your code expects.

So it is on the programmer to define what happens on error, they could ignore, try to back off by computing the high value bits, `exit` or `abort`.

gpderetta · on Dec 19, 2022

These operations simply return the carry flag, they are supposed to map directly to the hardware which normally doesn't raise exceptions for integer overflow.

Also they can be useful to implement bigints.

dahfizz · on Dec 19, 2022

C doesn't have exceptions...?

jacquesm · on Dec 19, 2022

The floating point implementation can signal SIGFPE, with a code FPE_INTOVF, it would seem to me that that is a suitable exception mechanism, it's just that the source isn't the floating point unit but the regular CPU.

Signals (kill), signal, raise, sigsetjmp, siglongjmp etc are C's exception handling mechanism. It's not as well integrated into the language as say a try-catch construct but it works well enough for situations like these. See: signal.h and setjmp.h

https://en.wikipedia.org/wiki/Signal_(IPC)

dahfizz · on Dec 19, 2022

Signals are not exceptions. Languages with exceptions still have to handle signals separately. Signals are an OS construct, and exceptions are a language construct.

To the question of "why not raise a signal on integer overflow in C?" - because signals are a terrible way of dealing with this. The signal handler doesn't know what code caused the overflow, and can't really do anything about it. Once the signal handler returns, the code itself has no idea it caused an overflow. Signals are a way for the OS to send signals to your program and not for control flow, after all. That's why `feenableexcept` is a niche extension that nobody uses.

The standard way of checking for fp errors is by calling `fetestexcept`. Personally I prefer this strategy (doing operation, then checking for errors) vs the new proposal for ints (checking for potential errors before doing the operation). But that is a matter of taste.

jacquesm · on Dec 19, 2022

Interesting, I've always considered signals to be an exception handling mechanism, as in the 'normal flow' of a program is interrupted and dealt with - or not - through some other mechanism. Learn a new thing every day, even after 40 years of programming in C :) Thanks!

https://en.wikipedia.org/wiki/Exception_handling

(which has this bit: "C does not have try-catch exception handling, but uses return codes for error checking. The setjmp and longjmp standard library functions can be used to implement try-catch handling via macros.")

dahfizz · on Dec 19, 2022

Its all semantics, I guess. You could argue that signals can be used to handle exceptional conditions (dereffing NULL). But signals are significantly different than what other programming languages call "exceptions".

Its the same thing as "run time". Pedantically, crt0 exists and therefore C has a "runtime". But it is nothing like what we refer to as a "runtime" today. The literal words are true but the meaning of the words doesn't match expectations.

jacquesm · on Dec 19, 2022

> signals are significantly different than what other programming languages call "exceptions"

C is likely considerably older than those 'other programming languages' and I'm still stuck in the past with my terminology.

kevin_thibedeau · on Dec 19, 2022

Ada had exceptions in 83 that are analogous to their modern incarnation. C was still just a baby then.

wahern · on Dec 19, 2022

If C was still just a baby in 1983, what does that make Ada?

Signals were a feature of Unix and C from the early 1970s; at least by 1973, if not earlier. Signals were a software-based abstraction over the concept of hardware interrupts, and even today we often say that hardware interrupts are triggered by (among other reasons) an "exception" or "hardware exception".

Similarly, the fact that the concept of interrupts and exceptions are related can be seen in the later (4.2 BSD, 1983) select(2) syscall, where the third fd_set argument is for capturing "exceptional condition(s)". (https://pubs.opengroup.org/onlinepubs/9699919799/functions/s...)

Ada enjoyed the benefit of at least another 10 years of reflection and evolution in computer science when choosing the semantics of their exception mechanism. But the success of Ada's (or Ada-like) semantics hasn't yet completely redefined the concept of exceptions.

jacquesm · on Dec 19, 2022

Ada was unobtanium for a long time for mere mortals like myself. In 1983 I was 18 and had access to a C compiler, an Ada compiler would have cost me an arm, a leg and my still to be first born, and information about the language was pretty much limited to what you could get from magazine articles of people that had maybe at some point known someone who had seen an Ada compiler in the wild.

The only other realistic options outside of government/enterprise were Pascal, BASIC and assembler, and within the bulk of the work was done in COBOL.

int_19h · on Dec 20, 2022

It's interesting just how far back the mechanism goes even beyond that. Some early structured PLs had non-local computed gotos, which turn out to map pretty close to exceptions, especially on implementation level - the need to unwind the stack etc. Some early papers on implementation even talk about using region maps for a zero-cost non-branching path, similar to the modern approach to C++ exceptions.

heywhatupboys · on Dec 19, 2022

kinda does though. floating point exception vectors are a thing and settable from most compiler impls

dahfizz · on Dec 19, 2022

fenv is available in standard C99. There is a GNU extension to trap and throw a signal when a fp error is hit (feenableexcept). I would definitely argue that signals != exceptions.

olliej · on Dec 19, 2022

This is simply standard using the builtins that the major c++ compilers already have. It does not remove the absurd “overflow is UB” semantics that introduces security bugs.

gustedt · on Dec 19, 2022

Only that here we are talking about C. But yes, most C compilers already seem to have this as builtins.

olliej · on Dec 19, 2022

haha, I'm so used to reading C++2x I assumed C++ - however the problem exists in both :-/

RcouF1uZ4gsC · on Dec 19, 2022

One major concern with these type of safety functions is that you have to explicitly opt in. If you are actually thinking about the need to call these function, you are already thinking about overflow.

jacquesm · on Dec 19, 2022

> If you are actually thinking about the need to call these function, you are already thinking about overflow.

As you should, for any datatype.

jdhdjdbdjdbd · on Dec 19, 2022

And that's why you use C in the first place?