Yeah, thanks to C pointer syntax, there's a big faction of programmers that think that understanding indirect addressing is a major intellectual accomplishment.
I don't have anything better to offer though - C pointers make a whole lot possible in a very terse way.
Easiest way to understand pointers is to learn it in assembler first, which is closer to what's actually going on. Then when learning the C syntax, it all fits into place neatly, rather than being mysterious asterisks and so on.
I can't imagine being a programmer without starting out by becoming proficient in assembler. It takes a lot of mystery out of programming languages because you know whatever the syntax is it all ends up as branches, calls, stacks, registers and memory accesses.
I would heavily recommend first learning assembly with an ISA that's not x86, though. The x86 ISA has just as much cruft as C, e.g. all the overlapped increasingly-sized registers, the now-irrelevant segment registers, etc. And x86 has some stuff that's not legacy, but which makes learning much harder — e.g. assembler-level instructions like LEA that compile to a set of entirely-different opcodes of different shapes depending on what you do with them. (IMHO, for best learning, you should be able to easily disassemble object code in the ISA "by hand" back into assembly, with just an opcode chart for reference. Doing this with arbitrary x86 assembly would be quite the challenge.)
I'm not sure what to recommend as an optimal learning ISA, though. Ideally something that's mostly RISC, with a small but unified memory model, and no virtual memory, and which has all the math ops that embedded RISC ISAs never bother to implement. (In the end, probably no real architecture ever made that exact set of compromises, and so it'd have to be something made up — some kind of bytecode abstract machine ISA designed specifically to be used in education, an ISA equivalent to languages like Logo or Scratch, or OSes like Minix.)
In practice, I think I would recommend learning assembler by programming for the original Gameboy's LR35902 ISA (which is almost, but not exactly, the Z80/8080 ISA). Not because it's the optimal learning ISA per se, but because it gives you a complete machine including graphical video output, with a compact unified address space (no separate IO-register read/write ops), modern tools that target it, and good emulators that almost always include built-in debuggers and memory viewers — where, as well, the memory is small enough to get it all on-screen and have room left over.
Oh, and unlike other ISAs, where people either never hand-write code for them, or where no hand-written code survives in a readable form, with LR35902 we have several complete projects with hand-written assembly codebases (or, well, reverse-engineered + annotated disassemblies of originally hand-written assembly — just as good!) around to be studied. And, of course, they're just the sorts of things kids interested in computers would be interested in studying the workings of: video games! (https://github.com/pret/pokered is a shining example here.)
If you want to do embedded I would kind of agree(But C is low level enough), otherwise... I'm not so sure.
Low level languages teach a lot of habits I don't like, and they don't teach as much about working in a modern high level environment on complex software.
Everyone should probably be vaguely competent with C, but I wouldn't call it a top priority.
If you're thinking about the underlying machine stuff when coding in Python, you're kind of just accepting leaky abstractions. If you're in the habit of first principles thinking, leaks seem normal.
C doesn't really seem to layer all that well. Whenever I code in C, I usually have to dig down to the SDK API level, and I wind up reading 10 other files to make a change in 1.
High level languages make it clear when something isn't DRY, or an abstraction leaks, or something isn't usable without modification. It's unusual, you notice right away, and you want to fix it because it obviously doesn't fit.
Low level languages make everything feel totally patched together. Encapsulation just isn't a big thing in most of it. It's almost like looking at an old point to point wired TV set compared to a rack of servers.
You might be able to write some amazing ASM code that feels as modern as some top notch Kotlin... but I'd call that an expert, not just a competent person.
And worst of all, the compiler can't catch many bugs. It's not like Rust. Or even Python. Low level languages don't teach the art of making sure that the computer has your back, the way Rust might. They normalize just entirely relying on your own skill, to the point where some C devs don't even seem to want the compiler to eliminate large classes of bugs for them.
High level languages teach a real appreciation of abstraction and safety, and standardization, that I don't feel like I really learned with C.
I agree with everything you said here. I'm not saying we should stick to bare metal programming even if we aren't doing embedded. There may be a problem that could be solved easily with inline assembly or there may be a bug that can't be understood if the programmer doesn't understand what is going on at the CPU level. Assembler is a tool in my kit, but not everybody needs to use it. I like to understand my computer from top to bottom, but sticking to higher level abstractions is where I spend most of my time.
> I don't have anything better to offer though - C pointers make a whole lot possible in a very terse way.
Terseness isn't a virtue in and of itself. If terseness helps readability that's great, but unreadable terse code isn't something to aim for.
One of the redeeming aspects of the C pointer declaration syntax is that hairy declarations can be decomposed using typedefs. Improving readability at the cost of reduced density is a good trade.
The biggest problem with C's pointer syntax is that it's prefix. If pointers were suffix, perhaps using a character that isn't doing primary duty as the multiplication operator, there wouldn't be any need for a Gibberish to English translator. Maybe think of @ as "address":
int i; // variable which is integer
int p@; // pointer that returns integer
int f(); // function returns integer
int f@(); // pointer to function
int f()@; // funtion that returns pointer
int a[10]; // array of ten integers
int a[10]@; // array of ten pointers
int a@[10]; // pointer to and array
int a[10]()@; // array of ten functions that return a pointer
Of course C is stuck with accidents/decisions made 50 years ago. But a language like Go turns all the declaration syntax on its head and still doesn't solve the problem. Rob Pike's blog post says as much (scroll to the bottom section on pointers):
> The biggest problem with C's pointer syntax is that it's prefix.
The biggest problem with C's pointer syntax is that it's prefix to the value.
int p*
is just as horrible as
int *p
because in both cases, it obscures that the type of p is "pointer to integer".
C's array and function syntax have exactly the same issue, the entire thing is back-asswards.
Let me demonstrate by converting your examples to a more regular (and logical) language[0]:
let i: i32 // integer
let i: Box<i32> // pointer to integer
let i: fn() -> i32 // function returning an integer
let i: Box<fn() -> i32> // pointer to a function returning an integer
let i: fn() -> Box<i32> // function which returns a pointer to an
let a: [i32;10] // array of 10 integers
let a: [Box<i32>;10] // array of ten pointers
let a: Box<[i32;10]> // pointer to an array
let a: [fn() -> Box<i32>;10] // array of ten functions that return a pointer
Even if you want to shorthand pointers (because you have only one and want it to be built-in), it remains significantly more readable than both C actual and your version.
[0] which doesn't mean perfect e.g. one could argue against the special syntax for arrays
You changed your post, so this is a reply to the changes.
Boxes are containers, not pointers or references. Let's look at all of those those using the star syntax and operator inside of an unsafe block? You'll end up with stuff like:
fn main() {
let mut a = [0i32; 10];
let p: *mut [i32] = &mut a;
// Rust did NOT solve this problem
let x: i32 = unsafe { (*p)[1] };
print!("{x}\n");
}
Note the annoying parentheses because they used prefix syntax for pointers, again.
> Boxes are containers, not pointers or references.
There is no actual difference between the two. Pointers and references could have been `Pointer<T>` and `Ref<T>` with the exact same ABI. They're not for UX reasons, one of similarity with C, and one of syntactic overhead.
> // Rust did NOT solve this problem
> let x: i32 = unsafe { (*p)[1] };
That is a debate of precedence, which is a very different concern than what I was replying about, and what TFA is about: the absurd mess that are C type declarations.
Rust has raw pointers, references, slices, and arrays, each with special syntax in type/variable declarations. You're conflating those with Boxes, and it isn't adding clarity to the discussion. It's not like C++ removed this problem now that people use `std::unique_ptr<T>`.
Rust's raw pointer syntax is exactly as bad as C's, because it's the same syntax. Yes, it could use some other syntax like `Ptr<T>` for declaration, but it doesn't. And that still says nothing about what operator syntax to use when dererencing the pointer.
Perhaps it could be:
let p: Ptr<T> = whatever;
let x = p.deref(); // suffix syntax!
> That is a debate of precedence, which is a very different concern
It's not a different concern, and it's a problem no matter what the precedence. If prefix pointer syntax binds tightly, you'll need parens in one case. If it binds loosely, you'll need parens in another case:
*(binds_tightly.foo[10]); // subscript the array first because star binds tightly
(*binds_loosely).bar[10]; // dereference the pointer first because star binds loosely
However, if pointer syntax is suffix (like arrays, function calls, and field accesses), you simply don't need parens in either case. The operations are just in order.
> (the absurd mess that are C type declarations)
Hating on C is trendy, that's fine. But a lot of people are confused about what the problem really is. It could be fixed with one change, but since people can't see it clearly, they change everything. And in cases like Go or Rust, they change almost everything and still don't solve the problem.
A language could achieve C's goal of "declaration follows usage" if pointers had suffix syntax. The prefix pointer syntax is where crazy "winding rules" and "Gibberish to English" translators come in.
It's clear you prefer the final type as suffix. That just means you need another syntax or keyword (such as "let") to introduce a variable declaration.
However function calls, array subscripts, and field accesses are the same "kind" of operation as dereferencing a pointer. Put the type last if it makes you happy, but you'll have "horrible" syntax if you don't treat pointers the same as the others:
something.bar[10]() // access the field, subscript the array, call the function
something.foo@ // access the field, dereference the pointer
> It's clear you prefer the final type as suffix. That just means you need another syntax or keyword (such as "let") to introduce a variable declaration.
Not really? Assuming type and variable names are lexically distinguishable via the lexer hack or, say, capitalization. Here’s an LR(1) grammar (in Grammophone[1] syntax) with postfix types and dereferencing:
It lacks the comma operator or Go-style grouping of multiple variables before a single type, but adding either one is easy and both is possible, though in the latter case the “ambi” hack would need to be extended. I’m willing to believe you could even do this with LL(1) / recursive descent, but the grammar structure would probably end up rather horrible.
Whether this is a smart idea as far as ergonomics of syntax errors go, I don’t know.
> Assuming type and variable names are lexically distinguishable via the lexer hack or, say, capitalization.
Hah, I appreciate your point and careful reply, but that's quite an assumption :-)
I think using capitalization does qualify as "another syntax", which I was cautious to qualify. And (as you know) the "lexer hack" means you need to declare your types before you use them, which is a bummer for mutually recursive types.
Anyways, I'll concede that you're right in that you can do it, but I don't think I was very wrong :-)
I can honestly say I wasn’t motivated by how wrong I felt you were, but by how I have been fiddling with declaration syntaxes like this over the last several months :)
Which actually led me to make an error here. The starting point was C syntax so I thought assuming a lexer hack was fair game, but the grammar above doesn’t actually need it! It doesn’t need one even if you add parameterized types with the same syntax as function calls. It only needs it if you want to write function calls as juxtaposition without parens ML style, use infix syntax for type application, or (and that is what I have been fiddling with) drop the semicolons.
(D'oh, I knew I was forgetting something! It was parenthesized expressions. And apparently I also screwed up optional comma termination. And curried calls.)
So, for example, you can add these productions to the grammar above and it still stays LR(1):
# Remove lexer hack, add parameterized types
type -> name | type ( ) | type ( type ) | type ( targ ) | type ( targ type ) .
targ -> type , | targ type , .
# Fix the stupid
dfun -> ambi ( darg decl ) | dfun ( decl ) | dfun ( darg ) | dfun ( darg decl ) .
efun -> ambi ( earg expr ) | efun ( expr ) | efun ( earg ) | efun ( earg expr ) | ( expr ) .
I’m not really a fan of this variant of “declaration mirrors use”, mind you. Type ascription usually requires some other syntax (both C and Go suffer from this); custom types that don’t have special punctuation are awkward and even tuples will be kind of forced. Typing the occasional colon does not seem like a bad tradeoff in comparison. I’d love to see these problems solved, though.
By the way, I'm kind of surprised you didn't use 4-letter names for your stub section at the bottom. How can you suffer the inconsistency in such an otherwise consistent grammar? :-)
I console myself with the idea that they are terminals, and I can see if I forgot to define a nonterminal by checking that no examples contain four-letter words. Choosing a name for the sum nonterminal was much more painful :)
Don't think I've ever constructed a more complicated type ('directly' so to speak, i.e without typedefs) than an array of function pointers (array of interrupt handlers, taking no arguments, returning void).
99.999(repeating)% of the time I use a typedef for function pointers and nothing looks particular complicated.
Yeah, I'm not confused by C's syntax any more either. However, when I was learning it (about 30 years ago), things were very non-obvious.
And there are a lot of people who want to throw the baby out with the bathwater. Go is the one that really amuses me here: They broke the "declaration follows usage", which I think is a nice idea, and they still have backwards pointer syntax that needs parentheses to get it right.
> Don't think I've ever constructed a more complicated [...]
Heh, I'm not sure this is a great argument. It's kind of like saying you don't care about the complexity/details in floating point because you've never needed more than integers. :-)
I find that confusing in a bunch of other ways. I much prefer the non C style of name : type rather than having the name embedded in the middle of the type.
A bunch of ways? Maybe it's just confusing because you haven't thought about it before.
There's certainly a tiny trade-off about where to put the core type name. If you have a "let" keyword, your declaration is just a little more verbose:
int foo[10];
let foo: [10] int;
If you don't have type inferencing, you always need to specify the type, and the let keyword and colon are just more noise. I know it's a matter of taste, but I think the first line is cleaner.
But my point isn't about where to put the core type (int). It's that pointer dereferencing is the same kind of operation as array subscripting, accessing fields, and calling functions. So unless you put it in order with the rest, you need ugly parens in any general case:
// prefix pointer syntax is convoluted and needs parens:
(*(x.foo.bar))("arguments")[10];
// grab field foo, then field bar, then derefence pointer,
// then call function, then subscript array
x.foo.bar@("arguments")[10]
I do actually think about this sort of thing quite a bit, and work with a bunch of people who have thought about these kinds of issues far more than me. I’m firmly of the belief that C style type syntax is never going to not be confusing, and changing the pointer to postfix just changes the places where it’s confusing. Personally I think a core problem is that declaration and use are different, and while I agree that dereferencing is generally much clearer as a postfix operator it doesn’t solve the confusion on the declaration side.
When reading your examples I found myself having to constantly second guess the precedence of arrays and pointers, and I’m not sure how I would declare the type of a function that returns a pointer to an array, or an array of pointers. I’m happy to accept the extra noise of let because it reduces the overall time needed to understand what code means, and we end up reading code so much more than we write it.
> I do actually think about this sort of thing quite a bit
Forgive me for being abrasive above.
> I’m firmly of the belief that C style type syntax is never going to not be confusing
Yeah, C is going to stay C. No fixing that. The bummer is when new languages repeat the problem because they don't understand what causes it.
> Personally I think a core problem is that declaration and use are different
I claim that declaration and use can be the same. In all cases, you read the affixes for arrays, pointers, function calls, and struct/union fields from left to right.
// declarations
int i
int a[10]
int f(int arg)[10] { ... }
int p@
// usage
print i
print a[1]
print f(1)[1]
print p@
You could change the declarations to use a "let" keyword, but I don't think it'll make them more readable while maintaining the goal of "declaration follows usage".
> I’m not sure how I would declare the type of a function that returns a pointer to an array
I agree here. I really like what Zig has done with pointer types. I wrote about it here [0]. What I like most is that all the types consistently read from left to right, and there are types that encode the cardinality of a pointer unlike C. I wonder what you or others think about Zig’s pointers syntax.
This looks like you’re assigning the value 0 to i32, a type. It’s nonsensical. Also what is the purpose of the var keyword? Why not put the type there?
The array syntax makes even less sense!
I think what you’re missing in your examples is how the values are used after declaration. C syntax was designed so that declaration resembles use. The reason array subscript brackets come after the variable name is because that’s how you access elements of the array:
int arr[3] = { 12, 76, -42};
arr[1] += 4;
if (arr[2] < 0) { printf(“it’s negative!\n”); }
Go solves 99% of the problem quite nicely. You don't often deref pointers in Go, because of auto-addressing. E.g. there's no arrow operator as in C, the dot derefs when necessary.
I would say `v := (p.)` could've been the deref operator, but what do I know.
That you don't deref pointers in Go is not what fixes this issue.
What fixes this issue is that, like most other languages which are not C, Go understands that "is a pointer" is a property of the type, not the name / value.
So even if it used C-style declaration, Go would say
When you arrive at the sort of gibberish that you need a tool to make sense of it, it's time to split the declaration into several much simpler typedef building blocks (especially recommended for function types).
Also for what it's worth, this isn't 'human readable' either, or is it?:
"declare foo as pointer to function (pointer to const void) returning pointer to array 3 of int"
Everytime I read the chapter in K&R, I first get a headache, then I think I understand it, then I think I dont understand it, but it is something very easy (just follow the circle rule). Then next year the cycle repeats.
Except the circle rule doesn't really help unless you know when to skip stuff because you're not done on the other side. The right-left rule instead is precise and correct.
I'm actually surprised I had to read this far down the comments to see a reference to the right-left rule. Is that not common knowledge anymore for C programmers?
As part of Derw, I've added an English output generator[0], with the idea that the output could be used to improve error messages or help developers understand unfamiliar syntax, particularly for those more used to Javascript rather than ML. Similar concept to cdecl
It would be good if there was something similar for the other constructs, like statements, loops, calls... That would become a "C" to natural language translator. I wonder how hard it is to turn:
for(int i = 0; i < x; i++)
into "declare 'i' as int and assign '0' to it; while 'i' is smaller than 'x' do ... and increment 'i'."
You can add new syntax without destroying backward compatibility, as long as it doesn't conflict with existing syntax. C++ took C and added the std::unique_ptr<int> syntax, which is much more readable.
I don't have anything better to offer though - C pointers make a whole lot possible in a very terse way.