> The first one that comes to mind is its arbitrary-sized integers. That sounds ...

pcwalton · 2025-02-05T06:05:40 1738735540

I'm normally not sympathetic to the "you don't need that" argument, but there is a much stronger argument for not having arbitrarily-sized integers in Rust: the fact that values of such types can't have an address. The reason why our types all have bit sizes measured in octets is that a byte is the minimum granularity for a pointer.

Miksel12 · 2025-02-05T10:42:20 1738752140

They could just be aligend and padded to the next power of 2, right? I think they work like that in Zig and only when they are put in a (bit)packed struct are they actually (bit)unaligned and unpadded.

chrisco255 · 2025-02-05T06:29:48 1738736988

A byte isn't the minimum granularity for a pointer. The minimum is based on whatever target you're compiling for. If it's a 32-bit target platform, then the minimum granularity is 4 bytes. Why should pointer size determine value size though? It's super fast to shift bits around, too, when needed.

pcwalton · 2025-02-05T06:33:56 1738737236

> If it's a 32-bit target platform, then the minimum granularity is 4 bytes.

Huh? How do you think `const char *s = "Hello"; const char *t = &s[1];` works?

> Why should pointer size determine value size though?

Because you should be able to take the address of any value, and addresses have byte granularity.

zozbot234 · 2025-02-05T07:46:27 1738741587

> Because you should be able to take the address of any value

That's debatable, though. One could argue that languages should explicitly support "values that are never going to have their address taken, be passed by reference/pointer, etc." which would only become addressable, e.g. as part of a struct.

pcwalton · 2025-02-05T09:04:08 1738746248

C and C++ (the latter unofficially, IIRC) have this with bitfield types, and they aren't very well loved, precisely because they aren't addressable.

mmusson · 2025-02-05T12:08:07 1738757287

> Huh? How do you think `const char s = "Hello"; const char t = &s[1];` works?

I think you and the parent are using different definitions of granularity. The parent meant that sizeof(t) could be 32 or 64 bits. I think you just meant that the smallest thing the pointer references is the address of a single byte.

Rust already has fat pointers though. A reference to a smaller byte value could be a pointer plus a bit-mask.

Gibbon1 · 2025-02-05T06:56:39 1738738599

With a 64 bit pointer you could make it bit addressable.

pcwalton · 2025-02-05T09:03:24 1738746204

At that cost of having your loads no longer be hardware load instructions, which would be bad for performance.

Gibbon1 · 2025-02-05T11:23:06 1738754586

High performance memory systems aren't register based anyways.

chrisco255 · 2025-02-05T07:46:11 1738741571

> Huh? How do you think `const char s = "Hello"; const char t = &s[1];` works?

So this is just an example of a pointer offset by 1 slot. It's not conceptually all that different to take sub-values of a slot.

To work with values that consume fractions of a byte, such as a 3-bit integer, a pointer + a bit offset is used to grab that value (that complexity is abstracted by the compiler). My argument is the bit shift necessary to load the 3-bits you need from the 8-bit slot is one of the cheapest operations a CPU can do. Meanwhile using 3-bits when all you need is 3-bits allows for things like arrays and packed structs to use much less memory than padding everything to 8 bits.

And why shouldn't a language support that?

pcwalton · 2025-02-05T09:01:47 1738746107

If your pointers have to have a bit offset, then now a pointer isn't just 1 hardware word, which would be odd to say the least for a systems language. (It would also be slow, because making a load anything other than a load is something you really don't want to do; you would lose the ability to fold into addressing modes on x86 in many cases for example.)

If some pointers have a bit offset but others don't, OK, I guess, but I'd argue that at that point it'd be a cleaner design to just have a "pointer plus bit offset" be a separate type from "regular pointer". And that would get back to the problem that you would have a type that you couldn't take a "regular" pointer to.

manwe150 · 2025-02-05T15:25:47 1738769147

Just to close the loop here, it looks like that is exactly what zig does for pointers, such that there is a different type of pointer for each combination of bit offsets and byte alignments, and yes it is distinct from a "regular" pointer (of which there isn't one "regular" type either, but also many separate flavors, including C-compatible, single-item, and many-item): https://ziglang.org/documentation/master/#toc-packed-struct