Good persistence in finding a good solution, but I wish the author found a solut...

regularfry · on Jan 23, 2024

If you're happy with a compile and link step, you can embed arbitrary data at link time with an ordinary linker by making yourself a linkable object with objcopy(1). I've played with it in the past as a way to embed an sqlite database into a ruby interpreter, which lets you do funky things like reimplementing `require` to read from the embedded database.

electroly · on Jan 23, 2024

It's a possibility. It's not unreasonable for me to ship builds of objcopy and lld that I can run at embed time. An immediate difficulty is that I support Windows and macOS, so I need a solution for PE and Mach-O executables too. I think a solution probably exists but I may need a separate solution for each platform. Embedding resources into binaries is pretty easy, it's just a matter of how to do it without shipping an entire C toolchain to users of my language.

regularfry · on Jan 23, 2024

I suspect the easier thing to do today is actually just embed tcc, and use its linker. Generating a C source file that embeds whatever binary you want in a string literal is straightforward templating. I couldn't really do that at the time.

electroly · on Jan 23, 2024

That's a great idea. I'll have to check and see if some fork of tcc today supports all my target platforms, but I bet it does.

saagarjha · on Jan 26, 2024

The linker that ships on macOS supports embedding data from a file with -sectcreate.

zokier · on Jan 23, 2024

would using libbfd or llvms objcopy library be of use here? https://llvm.org/doxygen/namespacellvm_1_1objcopy.html

pitherpather · on Jan 23, 2024

I don't know how clean or simple or portable you wish your build environment to be, but would it be worth embedding at compile-time?

Thinking of the general need in these situations to produce a custom-named interpreter/executable, could it be worth accessing the program name itself to find a paired source file? E.g., in invoking ./foo2.o it would look for ./foo2.code -- a two-file distribution allowing to double-click on the executable??

Could there be a non-unicode flag at the end of a special elf file which allows arbitrary unicode data to be concatenated after that? I.e., an agreed loader-ignore-hereafter convention or similar? (Asking with no knowledge of ELF internals, besides hints given in the OP.)

electroly · on Jan 23, 2024

Using two files would definitely work and, honestly, be a lot simpler. But it's a neat trick to make it a single file. For my toy language, it mostly serves to hide the fact that I'm not really compiling to native. People won't ask questions if it's a single executable that file(1) says is a statically linked binary.

Appending to the end of the ELF file does work. It won't mess anything up because your bytes will be outside of any ELF section. You can insert a known sentinel string and then search for it at runtime. The main problem is that you have to open your own executable file up for reading so you can locate the data at the end, and on Linux that requires having /proc, AFAIK. The nice thing about these other techniques is we're not assuming anything about the filesystem we're in. In a chroot environment you might not have /proc.

pitherpather · on Jan 23, 2024

Given your pursuit of elegance, I imagine you could ultimately have a --clone or --cloner command-line switch which would allow any executable instance based upon your interpreter to create a new executable instance, but encapsulating newly-supplied source code. In this sense your interpreter could go viral. (In tcl/tk context, freeWrap might be an example for study.)

Relatedly, I don't know whether, given argv[0] and your targets, one can at least copy the named file, even if one cannot open it directly for reading.

eptcyka · on Jan 23, 2024

The first arg to your program is a path to the binary that's being executed. No /proc required.

electroly · on Jan 23, 2024

It's usually the path to the binary being executed, but you can pass anything you want when you exec. e.g. execl("/bin/ls", "definitely not /bin/ls", NULL);