Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Checked integer arithmetic in the prospect of C23 (gustedt.wordpress.com)
59 points by signa11 on Dec 19, 2022 | hide | past | favorite | 64 comments


I posted overflow checking of signed integer arithmetic as a puzzle yesterday[1]. I got some good responses but none quite as minimal wrt number of instructions as my own solution:

    bool add_will_overflow(int32_t a, int32_t b) {
        uint32_t c = (uint32_t)a + (uint32_t)b;
        return (((uint32_t)a ^ c) & ((uint32_t)b ^ c)) >> 31;
    }
That produces the following assembly (see Godbolt[2]):

        lea     edx, [rdi+rsi]
        mov     eax, edi
        xor     eax, edx
        xor     esi, edx
        and     eax, esi
        shr     eax, 31
        ret
In Rust, you can write a.checked_add(b).is_none() which produces the following assembly[3]:

        add     edi, esi
        seto    al
        ret
A fun fact about this code: the overflow flag which is set by the add instruction and then harvested dates back at least to the 8080 (almost 50 years ago) and is not present in vanilla ARM. However, Apple Silicon has it as an extension, to make life easier for Rosetta 2 binary translation[4]. So when you do get to use this shorter code sequence, be thankful of the effort that chip designers put in to make it execute efficiently.

I expect the C23 built-in functions will perform as well as Rust here, which is a win both for ergonomics (you can't really consider the current state of "will a+b overflow" to be discoverable) and performance.

[1]: https://mastodon.online/@raph/109535617953722719

[2]: https://godbolt.org/z/17zMsWjYv

[3]: https://rust.godbolt.org/z/36Ta9oP1P

[4]: https://news.ycombinator.com/item?id=33635720


Checked overflow operations are kind of the goto operation for "it's easy in assembly, hard in programming languages"--in hardware terms, it's usually check a flag, but since flag registers are not provided for in high-level languages, it becomes a game of try to write it in a pattern that the compiler can recognize, which is never a fun game to play. Even worse than addition is multiplication. Thankfully, C23 has finally added these operations.

Although, recently, I noticed I wanted a case where I wanted checked (u32 - u32) -> i32 and (u32 + i32) -> u32 operations, which even Rust's standard library doesn't provide. (The use case is keeping track of a running delta between two lists of u32 values--the delta can go positive or negative, so it has to be signed, but the values in the lists can never be negative).


> it becomes a game of try to write it in a pattern that the compiler can recognize

Worse, in C or C++ you also need to find a way to do it without undefined behaviour. You can't just do the operation and see if the result matches expectations…


The addition operator just landed in Rust 1.66 (checked_add_signed[1]), but the subtraction one it looks like you'd need to roll your own.

[1]: https://github.com/rust-lang/rust/issues/87840


You are probably looking at it wrong. You can now write

  i32::checked_add_unsigned(some_u32)  and

  i32::checked_sub_unsigned(some_u32)
... which I think are exactly what your parent needs.


u32::checked_add_signed solves one of the pairs (u32 + i32 -> u32), but there's nothing for the other one (u32 - u32 -> i32).


Good point, I'm sure this made sense to me when I wrote it, but I can't ask past me why.


C is supposed to be reasonably portable so it doesn’t make too much sense to depend on flags that are not available on all platforms.


Why can’t it be a flag on some platforms and complex hereafter branching logic on other platforms? Like the division shim on platforms that don’t have a division instruction?


It's easier to find a platform that doesn't support floating-point numbers than one that doesn't have some kind of overflow flag. Some kind of platform-independent abstraction is long (as in, decades) overdue.


Sure, but it kind of is insane that there are platforms (like ARM apparently, did not know) that do NOT have a carry flag.


This is already present as a builtin in in GNU C (as indicated in TFA) and it already results in the optimal code: https://godbolt.org/z/qc4zvav7E


"__builtin_add_overflow" in gcc produces the same output as "checked_add()".

I really hope stuff like this is added to the standard.


Hmm, isn't the Apple-specific magic only for parity(PF) and aux carry (AF)? aarch64 does have a 'V' flag for signed overflow.


Oops, you're right. Too late to edit, sorry about the confusion.


How does it actually perform? by default I generally assume flags registers are kinda dicey for performance due to the dependency.


Also, Hacker's Delight and OpenBSD probably have clever solutions for these.


Surprisingly, OpenBSD does not have a library (neither a public API nor even just routines which are copied project-to-project as is common with OpenBSD utilities and daemons) to handle arithmetic overflow. The closest might be malloc/realloc extensions, like reallocarray, that handle common scenarios where arithmetic overflow is seen.


FTA: “Their working is quite simple: the arithmetic is as if performed in the set of mathematical integers and then the value is written to result. If it fits, the return value is false. If it doesn’t fit, the return value is true”*

They also give example code

  bool add_invalid = ckd_add(&result_add, a, b);
I can see that fits with “most of the time, anything positive means ‘no error’”, for example in malloc, write, read or printf, but these new functions return bool, not int, and the chosen method will require writing a double negation sometimes:

  if(!add_invalid) { … }
That’s not too bad, but if I were to see

  if(!ckd_add(&result_add, a, b)) { … }
I would expect that to test for failure, not success.

Because of that, I think I would have chosen to return true on success, false on failure. I’m curious as to what arguments led to the choice made.


In C, the value 0 is equivalent to false and all nonzero values are equivalent to true. It’s a convention throughout the C standard library to return 0 on success and nonzero when some error occurred. The behaviour of the new checked arithmetic library is consistent with that convention.


> It’s a convention throughout the C standard library to return 0 on success and nonzero when some error occurred.

If only it were so simple. read and write, for example, return a number less than zero on error and a non-negative number on success, and malloc returns zero on error, and nonzero on success.

The general rule for early C seems to be “whatever’s the best way to cram a return value or an error in an int” (probably the correct decision for the time)

Also, these new functions return a bool, which, in C23, gets integer-converted to zero for false and one for true (https://en.cppreference.com/w/c/language/bool_constant. C17 had macros for true and false, with false being zero)

and the reverse, converting to bool similarly has zero fro false (https://en.cppreference.com/w/c/language/conversion#Boolean_...):

“A value of any scalar type (including nullptr_t) (since C23) can be implicitly converted to _Bool. The values that compare equal to an integer constant expression of value zero are converted to 0 (false), all other values are converted to 1 (true).”

(https://en.cppreference.com/w/c/language/bool_constant)


_when the return value is an error code_, zero means success and nonzero means failure. Functions like `read`, `recv`, etc etc don't just return an error code. They return an actual value.

Functions that only return an error code like `stat`, `connect`, and the proposed ckd_add, return 0 on success and nonzero on error.


Unfortunately there are several error conventions in the C standard.

Here, the committee just standardized existing practice, namely the gcc builtins. We just adjusted the call sequence in putting the pointer parameter for the result first.


And in shell it's convention for programs to return 0 for success and nonzero when an error occurred. The issue is that in shell, the `true` builtin returns `0` and `false` returns `1`, which is the opposite of C's `bool`. And almost every other language's Boolean type.


The proposal fits my intuition. When the return value is an error code, truthy values are an error.

    int rc = func();
    if( rc ) { /*handle error*/ }

examples from the stdlib: connect(), stat(), etc. Hell, even main is defined to return 0 on success, nonzero on error.


Not for bool though.


I find the most readable, unambiguous thing is to define explicit macros or constants, e.g.

  if(ckd_add(&result_add, a, b) == CKD_SUCCESS) { ... }
or alternatively,

  if(CKD_SUCCESS(ckd_add(&result_add, a, b))) { ... }


The return value is the overflow condition so just name it CKD_OVERFLOW.


Yes, it's backwards. And counter-intuitive use of bool :(

They felt fine changing the argument order, why stick with the reverse polarity?


I know most of these are compiler intrinsically but it’s good to have them standardized.


Really fun, great introduction to the exciting new features.

Which part of the example implemented using `nullptr` instead of `NULL`, which is also from the future, though?


Yes, `nullptr` will also be in C23.


What for, if ((void *)0) is sufficient in C?



Apparently implementations don’t use this or something like that.


tl;dr

    #include <stdckdint.h>



    bool ckd_add(type1 *result, type2 a, type3 b);
    bool ckd_sub(type1 *result, type2 a, type3 b);
    bool ckd_mul(type1 *result, type2 a, type3 b);



    #include <stdckdint.h>
    #include <limits.h>

    /* ... */
    int x;
    int a = INT_MAX;
    int b = INT_MAX;

    if (!chk_add(&x, a, b)) {
       /* error! */
    }
Other stuff on the table for C23

- https://thephd.dev/c-the-improvements-june-september-virtual...

- (PDF) https://open-std.org/jtc1/sc22/wg14/www/docs/n3054.pdf


You have it backwards: it returns true on error.


> if (!chk_add(&x, a, b)) { > /* error! */ > }

does it return non-zero on success???


They're using a bool, so it violates the old-school C paradigm used for library calls. This is more of a macro rather than a syscall or standard library function call.


It returns a bool, but true means error, not false.


It's not really an error. The function is just "check if this addition overflows" and tells you true, it does overflow, or false, it doesn't.


It returns a boolean.


Let's take this as evidence that the proposal is counter-intuitive?


What was the rationale for not simply making it optional to throw an exception on overflow?


Besides C not having exceptions (that you could catch), the point was and is to have a way such that such a call has defined behaviour under any circumstances. The return value of the functions can even be ignored if the wrap around of the overflowing value is what your code expects.

So it is on the programmer to define what happens on error, they could ignore, try to back off by computing the high value bits, `exit` or `abort`.


These operations simply return the carry flag, they are supposed to map directly to the hardware which normally doesn't raise exceptions for integer overflow.

Also they can be useful to implement bigints.


C doesn't have exceptions...?


The floating point implementation can signal SIGFPE, with a code FPE_INTOVF, it would seem to me that that is a suitable exception mechanism, it's just that the source isn't the floating point unit but the regular CPU.

Signals (kill), signal, raise, sigsetjmp, siglongjmp etc are C's exception handling mechanism. It's not as well integrated into the language as say a try-catch construct but it works well enough for situations like these. See: signal.h and setjmp.h

https://en.wikipedia.org/wiki/Signal_(IPC)


Signals are not exceptions. Languages with exceptions still have to handle signals separately. Signals are an OS construct, and exceptions are a language construct.

To the question of "why not raise a signal on integer overflow in C?" - because signals are a terrible way of dealing with this. The signal handler doesn't know what code caused the overflow, and can't really do anything about it. Once the signal handler returns, the code itself has no idea it caused an overflow. Signals are a way for the OS to send signals to your program and not for control flow, after all. That's why `feenableexcept` is a niche extension that nobody uses.

The standard way of checking for fp errors is by calling `fetestexcept`. Personally I prefer this strategy (doing operation, then checking for errors) vs the new proposal for ints (checking for potential errors before doing the operation). But that is a matter of taste.


Interesting, I've always considered signals to be an exception handling mechanism, as in the 'normal flow' of a program is interrupted and dealt with - or not - through some other mechanism. Learn a new thing every day, even after 40 years of programming in C :) Thanks!

https://en.wikipedia.org/wiki/Exception_handling

(which has this bit: "C does not have try-catch exception handling, but uses return codes for error checking. The setjmp and longjmp standard library functions can be used to implement try-catch handling via macros.")


Its all semantics, I guess. You could argue that signals can be used to handle exceptional conditions (dereffing NULL). But signals are significantly different than what other programming languages call "exceptions".

Its the same thing as "run time". Pedantically, crt0 exists and therefore C has a "runtime". But it is nothing like what we refer to as a "runtime" today. The literal words are true but the meaning of the words doesn't match expectations.


> signals are significantly different than what other programming languages call "exceptions"

C is likely considerably older than those 'other programming languages' and I'm still stuck in the past with my terminology.


Ada had exceptions in 83 that are analogous to their modern incarnation. C was still just a baby then.


If C was still just a baby in 1983, what does that make Ada?

Signals were a feature of Unix and C from the early 1970s; at least by 1973, if not earlier. Signals were a software-based abstraction over the concept of hardware interrupts, and even today we often say that hardware interrupts are triggered by (among other reasons) an "exception" or "hardware exception".

Similarly, the fact that the concept of interrupts and exceptions are related can be seen in the later (4.2 BSD, 1983) select(2) syscall, where the third fd_set argument is for capturing "exceptional condition(s)". (https://pubs.opengroup.org/onlinepubs/9699919799/functions/s...)

Ada enjoyed the benefit of at least another 10 years of reflection and evolution in computer science when choosing the semantics of their exception mechanism. But the success of Ada's (or Ada-like) semantics hasn't yet completely redefined the concept of exceptions.


Ada was unobtanium for a long time for mere mortals like myself. In 1983 I was 18 and had access to a C compiler, an Ada compiler would have cost me an arm, a leg and my still to be first born, and information about the language was pretty much limited to what you could get from magazine articles of people that had maybe at some point known someone who had seen an Ada compiler in the wild.

The only other realistic options outside of government/enterprise were Pascal, BASIC and assembler, and within the bulk of the work was done in COBOL.


It's interesting just how far back the mechanism goes even beyond that. Some early structured PLs had non-local computed gotos, which turn out to map pretty close to exceptions, especially on implementation level - the need to unwind the stack etc. Some early papers on implementation even talk about using region maps for a zero-cost non-branching path, similar to the modern approach to C++ exceptions.


kinda does though. floating point exception vectors are a thing and settable from most compiler impls


fenv is available in standard C99. There is a GNU extension to trap and throw a signal when a fp error is hit (feenableexcept). I would definitely argue that signals != exceptions.


This is simply standard using the builtins that the major c++ compilers already have. It does not remove the absurd “overflow is UB” semantics that introduces security bugs.


Only that here we are talking about C. But yes, most C compilers already seem to have this as builtins.


haha, I'm so used to reading C++2x I assumed C++ - however the problem exists in both :-/


One major concern with these type of safety functions is that you have to explicitly opt in. If you are actually thinking about the need to call these function, you are already thinking about overflow.


> If you are actually thinking about the need to call these function, you are already thinking about overflow.

As you should, for any datatype.


And that's why you use C in the first place?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: