Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Cdecl: C Gibberish ↔ English (cdecl.org)
95 points by mkl95 on April 23, 2022 | hide | past | favorite | 62 comments


Yeah, thanks to C pointer syntax, there's a big faction of programmers that think that understanding indirect addressing is a major intellectual accomplishment.

I don't have anything better to offer though - C pointers make a whole lot possible in a very terse way.


Easiest way to understand pointers is to learn it in assembler first, which is closer to what's actually going on. Then when learning the C syntax, it all fits into place neatly, rather than being mysterious asterisks and so on.


I can't imagine being a programmer without starting out by becoming proficient in assembler. It takes a lot of mystery out of programming languages because you know whatever the syntax is it all ends up as branches, calls, stacks, registers and memory accesses.


I would heavily recommend first learning assembly with an ISA that's not x86, though. The x86 ISA has just as much cruft as C, e.g. all the overlapped increasingly-sized registers, the now-irrelevant segment registers, etc. And x86 has some stuff that's not legacy, but which makes learning much harder — e.g. assembler-level instructions like LEA that compile to a set of entirely-different opcodes of different shapes depending on what you do with them. (IMHO, for best learning, you should be able to easily disassemble object code in the ISA "by hand" back into assembly, with just an opcode chart for reference. Doing this with arbitrary x86 assembly would be quite the challenge.)

I'm not sure what to recommend as an optimal learning ISA, though. Ideally something that's mostly RISC, with a small but unified memory model, and no virtual memory, and which has all the math ops that embedded RISC ISAs never bother to implement. (In the end, probably no real architecture ever made that exact set of compromises, and so it'd have to be something made up — some kind of bytecode abstract machine ISA designed specifically to be used in education, an ISA equivalent to languages like Logo or Scratch, or OSes like Minix.)

In practice, I think I would recommend learning assembler by programming for the original Gameboy's LR35902 ISA (which is almost, but not exactly, the Z80/8080 ISA). Not because it's the optimal learning ISA per se, but because it gives you a complete machine including graphical video output, with a compact unified address space (no separate IO-register read/write ops), modern tools that target it, and good emulators that almost always include built-in debuggers and memory viewers — where, as well, the memory is small enough to get it all on-screen and have room left over.

Oh, and unlike other ISAs, where people either never hand-write code for them, or where no hand-written code survives in a readable form, with LR35902 we have several complete projects with hand-written assembly codebases (or, well, reverse-engineered + annotated disassemblies of originally hand-written assembly — just as good!) around to be studied. And, of course, they're just the sorts of things kids interested in computers would be interested in studying the workings of: video games! (https://github.com/pret/pokered is a shining example here.)


I feel very lucky I learned on the 68000 in the original Mac. If a CPU could ever be considered a work of art, it was the 68000


If you want to do embedded I would kind of agree(But C is low level enough), otherwise... I'm not so sure.

Low level languages teach a lot of habits I don't like, and they don't teach as much about working in a modern high level environment on complex software.

Everyone should probably be vaguely competent with C, but I wouldn't call it a top priority.

If you're thinking about the underlying machine stuff when coding in Python, you're kind of just accepting leaky abstractions. If you're in the habit of first principles thinking, leaks seem normal.

C doesn't really seem to layer all that well. Whenever I code in C, I usually have to dig down to the SDK API level, and I wind up reading 10 other files to make a change in 1.

High level languages make it clear when something isn't DRY, or an abstraction leaks, or something isn't usable without modification. It's unusual, you notice right away, and you want to fix it because it obviously doesn't fit.

Low level languages make everything feel totally patched together. Encapsulation just isn't a big thing in most of it. It's almost like looking at an old point to point wired TV set compared to a rack of servers.

You might be able to write some amazing ASM code that feels as modern as some top notch Kotlin... but I'd call that an expert, not just a competent person.

And worst of all, the compiler can't catch many bugs. It's not like Rust. Or even Python. Low level languages don't teach the art of making sure that the computer has your back, the way Rust might. They normalize just entirely relying on your own skill, to the point where some C devs don't even seem to want the compiler to eliminate large classes of bugs for them.

High level languages teach a real appreciation of abstraction and safety, and standardization, that I don't feel like I really learned with C.


I agree with everything you said here. I'm not saying we should stick to bare metal programming even if we aren't doing embedded. There may be a problem that could be solved easily with inline assembly or there may be a bug that can't be understood if the programmer doesn't understand what is going on at the CPU level. Assembler is a tool in my kit, but not everybody needs to use it. I like to understand my computer from top to bottom, but sticking to higher level abstractions is where I spend most of my time.


Yes, but knowing assembly language won't help you through complicated C declarations of the sort used to demonstrate cdecl.


C feels like it was built against the proverbial managerial clock to prevent Rob Pike from rewriting yet another text editor in PDP11 assembly


> I don't have anything better to offer though - C pointers make a whole lot possible in a very terse way.

Terseness isn't a virtue in and of itself. If terseness helps readability that's great, but unreadable terse code isn't something to aim for.

One of the redeeming aspects of the C pointer declaration syntax is that hairy declarations can be decomposed using typedefs. Improving readability at the cost of reduced density is a good trade.


The fact that people write whole books about them probably doesn't help either.


The biggest problem with C's pointer syntax is that it's prefix. If pointers were suffix, perhaps using a character that isn't doing primary duty as the multiplication operator, there wouldn't be any need for a Gibberish to English translator. Maybe think of @ as "address":

    int i;          // variable which is integer
    int p@;         // pointer that returns integer
    int f();        // function returns integer
    int f@();       // pointer to function
    int f()@;       // funtion that returns pointer
    int a[10];      // array of ten integers
    int a[10]@;     // array of ten pointers
    int a@[10];     // pointer to and array
    int a[10]()@;   // array of ten functions that return a pointer
Of course C is stuck with accidents/decisions made 50 years ago. But a language like Go turns all the declaration syntax on its head and still doesn't solve the problem. Rob Pike's blog post says as much (scroll to the bottom section on pointers):

https://go.dev/blog/declaration-syntax

He suggests Pascal's caret syntax (^), which wouldn't work if it was also used as the binary xor operator.


> The biggest problem with C's pointer syntax is that it's prefix.

The biggest problem with C's pointer syntax is that it's prefix to the value.

    int p*
is just as horrible as

    int *p
because in both cases, it obscures that the type of p is "pointer to integer".

C's array and function syntax have exactly the same issue, the entire thing is back-asswards.

Let me demonstrate by converting your examples to a more regular (and logical) language[0]:

    let i: i32 // integer
    let i: Box<i32> // pointer to integer
    let i: fn() -> i32 // function returning an integer
    let i: Box<fn() -> i32> // pointer to a function returning an integer
    let i: fn() -> Box<i32> // function which returns a pointer to an 
    let a: [i32;10] // array of 10 integers
    let a: [Box<i32>;10] // array of ten pointers
    let a: Box<[i32;10]> // pointer to an array
    let a: [fn() -> Box<i32>;10] // array of ten functions that return a pointer
Even if you want to shorthand pointers (because you have only one and want it to be built-in), it remains significantly more readable than both C actual and your version.

[0] which doesn't mean perfect e.g. one could argue against the special syntax for arrays


You changed your post, so this is a reply to the changes.

Boxes are containers, not pointers or references. Let's look at all of those those using the star syntax and operator inside of an unsafe block? You'll end up with stuff like:

    fn main() {
        let mut a = [0i32; 10];
        let p: *mut [i32] = &mut a;

        // Rust did NOT solve this problem
        let x: i32 = unsafe { (*p)[1] };

        print!("{x}\n");
    }
Note the annoying parentheses because they used prefix syntax for pointers, again.


> Boxes are containers, not pointers or references.

There is no actual difference between the two. Pointers and references could have been `Pointer<T>` and `Ref<T>` with the exact same ABI. They're not for UX reasons, one of similarity with C, and one of syntactic overhead.

> // Rust did NOT solve this problem

> let x: i32 = unsafe { (*p)[1] };

That is a debate of precedence, which is a very different concern than what I was replying about, and what TFA is about: the absurd mess that are C type declarations.


> There is no actual difference between the two.

Rust has raw pointers, references, slices, and arrays, each with special syntax in type/variable declarations. You're conflating those with Boxes, and it isn't adding clarity to the discussion. It's not like C++ removed this problem now that people use `std::unique_ptr<T>`.

Rust's raw pointer syntax is exactly as bad as C's, because it's the same syntax. Yes, it could use some other syntax like `Ptr<T>` for declaration, but it doesn't. And that still says nothing about what operator syntax to use when dererencing the pointer.

Perhaps it could be:

    let p: Ptr<T> = whatever;
    let x = p.deref(); // suffix syntax!
> That is a debate of precedence, which is a very different concern

It's not a different concern, and it's a problem no matter what the precedence. If prefix pointer syntax binds tightly, you'll need parens in one case. If it binds loosely, you'll need parens in another case:

    *(binds_tightly.foo[10]);  // subscript the array first because star binds tightly
    (*binds_loosely).bar[10];  // dereference the pointer first because star binds loosely
However, if pointer syntax is suffix (like arrays, function calls, and field accesses), you simply don't need parens in either case. The operations are just in order.

> (the absurd mess that are C type declarations)

Hating on C is trendy, that's fine. But a lot of people are confused about what the problem really is. It could be fixed with one change, but since people can't see it clearly, they change everything. And in cases like Go or Rust, they change almost everything and still don't solve the problem.

A language could achieve C's goal of "declaration follows usage" if pointers had suffix syntax. The prefix pointer syntax is where crazy "winding rules" and "Gibberish to English" translators come in.


“declaration follows usage” is not a worthy goal. It's unreadable and doesn't allow for creating new parametrized types.


It's clear you prefer the final type as suffix. That just means you need another syntax or keyword (such as "let") to introduce a variable declaration.

However function calls, array subscripts, and field accesses are the same "kind" of operation as dereferencing a pointer. Put the type last if it makes you happy, but you'll have "horrible" syntax if you don't treat pointers the same as the others:

    something.bar[10]()  // access the field, subscript the array, call the function
    something.foo@       // access the field, dereference the pointer


> It's clear you prefer the final type as suffix. That just means you need another syntax or keyword (such as "let") to introduce a variable declaration.

Not really? Assuming type and variable names are lexically distinguishable via the lexer hack or, say, capitalization. Here’s an LR(1) grammar (in Grammophone[1] syntax) with postfix types and dereferencing:

  stmt -> init ; | expr ; .
  
  init -> decl | decl = expr .
  decl -> dtor type .
  dtor -> ambi | dfun .
  dfun -> ambi ( decl ) | ambi ( darg ) | dfun ^ | dfun ( ) .
  darg -> decl , | darg decl , .
  
  expr -> summ | summ = expr .
  summ -> prod | summ + prod .
  prod -> post | prod * post .
  post -> ambi | efun .
  efun -> ambi ( expr ) | ambi ( earg ) | efun ^ | efun ( ) .
  earg -> expr , | earg expr , .

  ambi -> name | ambi ^ | ambi ( ) .

  # Stub
  type -> int .
  name -> x .
It lacks the comma operator or Go-style grouping of multiple variables before a single type, but adding either one is easy and both is possible, though in the latter case the “ambi” hack would need to be extended. I’m willing to believe you could even do this with LL(1) / recursive descent, but the grammar structure would probably end up rather horrible.

Whether this is a smart idea as far as ergonomics of syntax errors go, I don’t know.

[1] https://mdaines.github.io/grammophone/


> Assuming type and variable names are lexically distinguishable via the lexer hack or, say, capitalization.

Hah, I appreciate your point and careful reply, but that's quite an assumption :-)

I think using capitalization does qualify as "another syntax", which I was cautious to qualify. And (as you know) the "lexer hack" means you need to declare your types before you use them, which is a bummer for mutually recursive types.

Anyways, I'll concede that you're right in that you can do it, but I don't think I was very wrong :-)


I can honestly say I wasn’t motivated by how wrong I felt you were, but by how I have been fiddling with declaration syntaxes like this over the last several months :)

Which actually led me to make an error here. The starting point was C syntax so I thought assuming a lexer hack was fair game, but the grammar above doesn’t actually need it! It doesn’t need one even if you add parameterized types with the same syntax as function calls. It only needs it if you want to write function calls as juxtaposition without parens ML style, use infix syntax for type application, or (and that is what I have been fiddling with) drop the semicolons.

(D'oh, I knew I was forgetting something! It was parenthesized expressions. And apparently I also screwed up optional comma termination. And curried calls.)

So, for example, you can add these productions to the grammar above and it still stays LR(1):

  # Remove lexer hack, add parameterized types
  type -> name | type ( ) | type ( type ) | type ( targ ) | type ( targ type ) .
  targ -> type , | targ type , .
  
  # Fix the stupid
  dfun -> ambi ( darg decl ) | dfun ( decl ) | dfun ( darg ) | dfun ( darg decl ) .
  efun -> ambi ( earg expr ) | efun ( expr ) | efun ( earg ) | efun ( earg expr ) | ( expr ) .
I’m not really a fan of this variant of “declaration mirrors use”, mind you. Type ascription usually requires some other syntax (both C and Go suffer from this); custom types that don’t have special punctuation are awkward and even tuples will be kind of forced. Typing the occasional colon does not seem like a bad tradeoff in comparison. I’d love to see these problems solved, though.


By the way, I'm kind of surprised you didn't use 4-letter names for your stub section at the bottom. How can you suffer the inconsistency in such an otherwise consistent grammar? :-)


I console myself with the idea that they are terminals, and I can see if I forgot to define a nonterminal by checking that no examples contain four-letter words. Choosing a name for the sum nonterminal was much more painful :)


My friend and I keep a link to the following because of our similarly neurotic tendencies :-)

https://github.com/timvieira/justified-variables


Meh, I find C syntax a total non-problem.

Don't think I've ever constructed a more complicated type ('directly' so to speak, i.e without typedefs) than an array of function pointers (array of interrupt handlers, taking no arguments, returning void).

99.999(repeating)% of the time I use a typedef for function pointers and nothing looks particular complicated.


Yeah, I'm not confused by C's syntax any more either. However, when I was learning it (about 30 years ago), things were very non-obvious.

And there are a lot of people who want to throw the baby out with the bathwater. Go is the one that really amuses me here: They broke the "declaration follows usage", which I think is a nice idea, and they still have backwards pointer syntax that needs parentheses to get it right.

> Don't think I've ever constructed a more complicated [...]

Heh, I'm not sure this is a great argument. It's kind of like saying you don't care about the complexity/details in floating point because you've never needed more than integers. :-)


I find that confusing in a bunch of other ways. I much prefer the non C style of name : type rather than having the name embedded in the middle of the type.


A bunch of ways? Maybe it's just confusing because you haven't thought about it before.

There's certainly a tiny trade-off about where to put the core type name. If you have a "let" keyword, your declaration is just a little more verbose:

    int foo[10];
    let foo: [10] int;
If you don't have type inferencing, you always need to specify the type, and the let keyword and colon are just more noise. I know it's a matter of taste, but I think the first line is cleaner.

But my point isn't about where to put the core type (int). It's that pointer dereferencing is the same kind of operation as array subscripting, accessing fields, and calling functions. So unless you put it in order with the rest, you need ugly parens in any general case:

    // prefix pointer syntax is convoluted and needs parens:
    (*(x.foo.bar))("arguments")[10];

    // grab field foo, then field bar, then derefence pointer,
    // then call function, then subscript array
    x.foo.bar@("arguments")[10]


I do actually think about this sort of thing quite a bit, and work with a bunch of people who have thought about these kinds of issues far more than me. I’m firmly of the belief that C style type syntax is never going to not be confusing, and changing the pointer to postfix just changes the places where it’s confusing. Personally I think a core problem is that declaration and use are different, and while I agree that dereferencing is generally much clearer as a postfix operator it doesn’t solve the confusion on the declaration side.

When reading your examples I found myself having to constantly second guess the precedence of arrays and pointers, and I’m not sure how I would declare the type of a function that returns a pointer to an array, or an array of pointers. I’m happy to accept the extra noise of let because it reduces the overall time needed to understand what code means, and we end up reading code so much more than we write it.


> I do actually think about this sort of thing quite a bit

Forgive me for being abrasive above.

> I’m firmly of the belief that C style type syntax is never going to not be confusing

Yeah, C is going to stay C. No fixing that. The bummer is when new languages repeat the problem because they don't understand what causes it.

> Personally I think a core problem is that declaration and use are different

I claim that declaration and use can be the same. In all cases, you read the affixes for arrays, pointers, function calls, and struct/union fields from left to right.

    // declarations
    int i
    int a[10]
    int f(int arg)[10] { ... }
    int p@    

    // usage
    print i
    print a[1]
    print f(1)[1]
    print p@
You could change the declarations to use a "let" keyword, but I don't think it'll make them more readable while maintaining the goal of "declaration follows usage".

> I’m not sure how I would declare the type of a function that returns a pointer to an array

    // function, pointer, array
    int f()@[10] ...


I agree here. I really like what Zig has done with pointer types. I wrote about it here [0]. What I like most is that all the types consistently read from left to right, and there are types that encode the cardinality of a pointer unlike C. I wonder what you or others think about Zig’s pointers syntax.

[0]: https://nathancraddock.com/blog/consistency-in-zigs-type-sys...


I can’t even get past the simple values syntax:

    var value: i32 = 0;
This looks like you’re assigning the value 0 to i32, a type. It’s nonsensical. Also what is the purpose of the var keyword? Why not put the type there?

The array syntax makes even less sense!

I think what you’re missing in your examples is how the values are used after declaration. C syntax was designed so that declaration resembles use. The reason array subscript brackets come after the variable name is because that’s how you access elements of the array:

    int arr[3] = { 12, 76, -42};
    arr[1] += 4;
    if (arr[2] < 0) { printf(“it’s negative!\n”); }


Same goes for pointers.

    int *foo = &bar;
    *foo = 1;
As far as syntax goes, tying the pointer to the identifier doesn't make sense to me, at least. Pointers associated with types makes much more sense:

    int* foo = &bar;
C++ thinks so too.


IMHO, this makes things even more confusing. Now part of the type comes before the id and part comes after.


Where the core type goes is orthogonal to the notation for affixes (array, function, pointer). Put the type last if you like :-)


Go solves 99% of the problem quite nicely. You don't often deref pointers in Go, because of auto-addressing. E.g. there's no arrow operator as in C, the dot derefs when necessary.

I would say `v := (p.)` could've been the deref operator, but what do I know.


That you don't deref pointers in Go is not what fixes this issue.

What fixes this issue is that, like most other languages which are not C, Go understands that "is a pointer" is a property of the type, not the name / value.

So even if it used C-style declaration, Go would say

    *int p
or, using a more verbose syntax

    Pointer[int] p


Even C isn't consistent about this:

  int * a, b; // "Property" of the variable

  typedef int * pint;
  pint x, y; // Property of the type
This has created so much bad legacy code where pointer typedefs are created for no good reason.


Why not solve 100% of the problem? And since pointer syntax is really so rare, you wouldn't even see the suffix operator.


If you are using C++, you can make this a lot easier using this template alias

   template<typename T>
   using ptr = T*;

   ptr<const char> str:
   const ptr<char> pchar;
   ptr<void(ptr<const char>)> fun_ptr;
   
Basically this simple trick removes most of the confusion, and you can read the types left to right.


It also makes function pointer types much clearer:

    using func_ptr = float(int);

    // Given
    float some_func(int);

    // Then
    func_ptr* a = &some_func;

    // Also
    func_ptr* b = [] (int x) { return 0.f; };


When you arrive at the sort of gibberish that you need a tool to make sense of it, it's time to split the declaration into several much simpler typedef building blocks (especially recommended for function types).

Also for what it's worth, this isn't 'human readable' either, or is it?:

"declare foo as pointer to function (pointer to const void) returning pointer to array 3 of int"


That seems perfectly readable to me, sans the part in the parentheses.


You know, you could just

  apt install cdecl
And then do it on the command line. But hey, let's turn everything into a web service, with monitoring as a bonus feature.

(Repost of https://news.ycombinator.com/item?id=11164685)


Everytime I read the chapter in K&R, I first get a headache, then I think I understand it, then I think I dont understand it, but it is something very easy (just follow the circle rule). Then next year the cycle repeats.


Except the circle rule doesn't really help unless you know when to skip stuff because you're not done on the other side. The right-left rule instead is precise and correct.

See https://news.ycombinator.com/item?id=5080081 and https://web.archive.org/web/20190714064656/http://cseweb.ucs...


I'm actually surprised I had to read this far down the comments to see a reference to the right-left rule. Is that not common knowledge anymore for C programmers?


> demo 1 > declare bar as volatile pointer to array 64 of const int

Maybe "declare bar as volatile pointer to array of 64 const int" will sound more smooth? :)


As part of Derw, I've added an English output generator[0], with the idea that the output could be used to improve error messages or help developers understand unfamiliar syntax, particularly for those more used to Javascript rather than ML. Similar concept to cdecl

0 - https://mobile.twitter.com/derwlang/status/15057099820246712...


It would be good if there was something similar for the other constructs, like statements, loops, calls... That would become a "C" to natural language translator. I wonder how hard it is to turn:

for(int i = 0; i < x; i++)

into "declare 'i' as int and assign '0' to it; while 'i' is smaller than 'x' do ... and increment 'i'."


what's stopping a new version of C from defining a better declaration syntax? (better = easier to read, write and understand)


1) 50 years of backwards compatibility.

2) It is not necessary.

C's terse syntax is not hard to understand for anyone who actually uses the language. Most of the comments here are from people who do not.


You can add new syntax without destroying backward compatibility, as long as it doesn't conflict with existing syntax. C++ took C and added the std::unique_ptr<int> syntax, which is much more readable.


Surprised to see this is the second result when googling “cdecl” despite its naming collision with the calling convention.


The unix utility probably predates the Microsoft C compiler.


Site is very sparse, I "think" I know what this is for, but I'm probably wrong, can anyone explain?


Type in some syntax in the C programming language, it will translate it into English to help you get a better mental picture of what is going on.


That's literally it? Thought might be more to it, hmm still interesting, thanks!


Well it goes the other way too. English to arcane C declaration. And that's what many of us have used it for decades (and decades...). Try:

    declare xyzzyx as pointer to pointer to array 153 of const double


I think of it as a programmers aide much like regex101.com or crontab.guru



the c pointer syntax is perfectly fine after realize you its just the operator precedence rules but in reverse.

  int foo, *bar, baz(), (*zyzzy())();
i do not understand why blowhard rob pike types dislike it. its exactly the kind of elegant you would think they would like




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: