> Actually, no. There is no portable "incbin" in C/C++ I kinda hate this kind of...

iainmerrick · on May 30, 2023

But it leads ridiculous design decisions, like "I'm going to write 8MB of source code instead of doing the portability work on my own to turn that data into a linkable symbol".

Why is that ridiculous? It strikes me as not necessarily the best, but the most obvious approach, the most portable, and possibly the quickest to implement.

Your toolchain might have a special way to import binary blobs, but a) you’ll have to dig through the docs to find it, b) you’ll probably need to solve the problem again when porting to a different platform, and c) who knows if it actually works, or if there are hidden gotchas?

Sure, if there’s a known tool or option that does the job, you should go ahead and use it. But in general, writing a little script to generate a bunch of boilerplate code is perfectly workable.

ajross · on May 30, 2023

> Your toolchain might have a special way to import binary blobs, but a) you’ll have to dig through the docs to find it

This is a corrollary: "I don't want to learn my tools, so I'll learn the language standard instead" is fundamentally exactly the problem I'm talking about.

Straight up: C linkage is a 1970's paradigm full of tools that had to run on a PDP/11, and it's vastly simpler than learning C++ or Rust. It's just not "modern" and no one taught it to you, so it looks weird and mysterious. That's the problem!

josephg · on May 30, 2023

I go back and forth on this argument when it comes to codegen. Like, you could make the same argument that protobuf shouldn't output C code. It should output an object file that you can link into whatever compiled language you want. C, fortran, C++, rust, who cares. As you say, the linking model is simple and works well.

Why do we generate big C/C++ strings instead, and compile those? Because object files have lots of compiler/platform/architecture specific stuff in them. Outputting C (or C++) then compiling it is a much more convenient way to generate those object files, because it works on every OS, compiler and architecture. Even systems that you don't know about, or that don't exist yet.

I hear what you're saying and I'm torn about it. I mean, aren't binary blobs just a simpler version of the problem protobuf faces? C code is already the most portable way to make object files. Why wouldn't we use it?

ajross · on May 30, 2023

> Like, you could make the same argument that protobuf shouldn't output C code.

The case at hand is an 8MB static array of integers. Obviously yes, of course, absolutely: you choose the correct/simple/obvious/trivialest implementation strategy for the problem. That's exactly what I'm saying!

In the case of protobuf (static generation of an otherwise arbitrarily complicated data structure with reasonably bounded size), code generation makes a ton of sense.

Joker_vD · on May 30, 2023

And the simple/obvious/trivialest solution is to write an array literal with 4 millions integers in it, while fighting with Microsoft's link.exe is anything but. Even using rc.exe and loading that data from the resource section is a non-trivial amount of additional work.

ajross · on May 30, 2023

Come on. Both binutils and nasm can generate perfectly working PE object files. I don't know the answer off the top of my head, but I bet anything even pure MSVC has a simple answer here. Dealing with the nonsense in the linked article is something you do for a quick hack or to test compiler performance, but (as demonstrated!) it scales poorly. It's terrible engineering, period. Use the right tools, even if they aren't ISO-specified. And if you can't or won't, please don't get into fights on the internet justifying the resulting hackery.

Joker_vD · on May 30, 2023

> I bet anything even pure MSVC has a simple answer here

You lose your bet, because it doesn't, neither its inline assembler nor actual MASM shipped with Visual Studio support any "incbin"-like directives. I guess you can generate an .asm with a huge db/dd, I guess, if you don't like a large literal array in .c files, but that's it.

wtetzner · on May 30, 2023

> Come on. Both binutils and nasm can generate perfectly working PE object files. I don't know the answer off the top of my head, but I bet anything even pure MSVC has a simple answer here.

Right, meaning you have to implement N solutions instead of just one. It's a common enough and useful enough feature for the language to support it. I think it would be a different story if linkers were covered by the language specification.

ajross · on May 30, 2023

I remain shocked at how controversial this is. Yes. Yes, implementing N trivial and easily maintained solutions is clearly better than one portable hack.

wtetzner · on May 31, 2023

Clearly many people disagree. And given that C now has #embed, I don't even think I'd consider it to be a hack.

> I remain shocked at how controversial this is.

I am a bit shocked that you think the right solution to making data statically available to the rest of your program is somehow outside the scope of the programming language.

ajross · on May 31, 2023

The comparison wasn't to #embed[1], but to an 8MB static array. You're winning an argument against a strawman, not me. For the record, I think #embed (given tooling that supports it) would be an excellent choice! That's not a defense of the technique under discussion though.

[1] Which FWIW is much less portable as an issue of practical engineering than assembler or binutils tooling!

maskros · on May 30, 2023

It's not "I don't want to learn my tools."

It's "I don't want to learn and debug _everybody else who may possible want to build this otherwise portable C program_'s tools."

When those tools change how they do this unportable thing every couple of years in subtle and incompatible ways, which require #ifdef's to handle the different ways those linked against symbols can be accessed, multiplied by dozens of different platforms, then yes, I'm going to compile an 8MB literal.

Joker_vD · on May 30, 2023

I am pretty certain I've seen linkers routinely writing 0 instead of symbols' actual sizes so getting the actual size of the embedded binary blob is not very pretty.

Also, you argument sounds exactly like those that the author of https://thephd.dev/finally-embed-in-c23 has been fighting against for 5 years straight.

iainmerrick · on May 30, 2023

Yes! That person is a hero.

iainmerrick · on May 30, 2023

It's just not "modern" and no one taught it to you, so it looks weird and mysterious.

I don’t know what to say except that I’ve worked with C linkers for a long time, since before I learned C++ and before Rust even existed, and I still don’t like ‘em.

duped · on May 30, 2023

> But it leads ridiculous design decisions, like "I'm going to write 8MB of source code instead of doing the portability work on my own to turn that data into a linkable symbol".

people use scripts to do this kind of thing and never think about it again, and it's still more portable than writing a custom build step.

DSMan195276 · on May 30, 2023

> FWIW: binutils/llvm objcopy is a better mechanism still for this sort of thing in most contexts, as it doesn't involve source compilation of any kind

I used to agree with that, but honestly a simple `xxd` to get a C array avoids so many issues with `objcopy` that I'd rather just use that now. With `objcopy` even just getting the names of the produced symbols to be consistent is a pain, and you have to specify the output target and architecture which is just another thing you have to update for more platforms (and if someone's using a cross-compiler, they have to override that setting too).

In contrast if you just produce a C array then it compiles like normal C code and links like normal C code, problem solved and all the complexity is gone.