Hacker News new | past | comments | ask | show | jobs | submit login

Good persistence in finding a good solution, but I wish the author found a solution that didn't involve mold. I'm in the same situation; I have a programming language that builds self-contained executables by bundling bytecode with a prebuilt interpreter. But I'm not nearly as smart as the author, so I link a fixed multi-megabyte chunk of sentinel bytes into the interpreter (using xxd to produce a literal array in a .c file) and at embed time I search for the bytes and overwrite them in-place. This works for executables on any platform and doesn't require a special linker, but the hardcoded limit (and wasted space when you come under the limit) is undesirable.



If you're happy with a compile and link step, you can embed arbitrary data at link time with an ordinary linker by making yourself a linkable object with objcopy(1). I've played with it in the past as a way to embed an sqlite database into a ruby interpreter, which lets you do funky things like reimplementing `require` to read from the embedded database.


It's a possibility. It's not unreasonable for me to ship builds of objcopy and lld that I can run at embed time. An immediate difficulty is that I support Windows and macOS, so I need a solution for PE and Mach-O executables too. I think a solution probably exists but I may need a separate solution for each platform. Embedding resources into binaries is pretty easy, it's just a matter of how to do it without shipping an entire C toolchain to users of my language.


I suspect the easier thing to do today is actually just embed tcc, and use its linker. Generating a C source file that embeds whatever binary you want in a string literal is straightforward templating. I couldn't really do that at the time.


That's a great idea. I'll have to check and see if some fork of tcc today supports all my target platforms, but I bet it does.


The linker that ships on macOS supports embedding data from a file with -sectcreate.


would using libbfd or llvms objcopy library be of use here? https://llvm.org/doxygen/namespacellvm_1_1objcopy.html


I don't know how clean or simple or portable you wish your build environment to be, but would it be worth embedding at compile-time?

Thinking of the general need in these situations to produce a custom-named interpreter/executable, could it be worth accessing the program name itself to find a paired source file? E.g., in invoking ./foo2.o it would look for ./foo2.code -- a two-file distribution allowing to double-click on the executable??

Could there be a non-unicode flag at the end of a special elf file which allows arbitrary unicode data to be concatenated after that? I.e., an agreed loader-ignore-hereafter convention or similar? (Asking with no knowledge of ELF internals, besides hints given in the OP.)


Using two files would definitely work and, honestly, be a lot simpler. But it's a neat trick to make it a single file. For my toy language, it mostly serves to hide the fact that I'm not really compiling to native. People won't ask questions if it's a single executable that file(1) says is a statically linked binary.

Appending to the end of the ELF file does work. It won't mess anything up because your bytes will be outside of any ELF section. You can insert a known sentinel string and then search for it at runtime. The main problem is that you have to open your own executable file up for reading so you can locate the data at the end, and on Linux that requires having /proc, AFAIK. The nice thing about these other techniques is we're not assuming anything about the filesystem we're in. In a chroot environment you might not have /proc.


Given your pursuit of elegance, I imagine you could ultimately have a --clone or --cloner command-line switch which would allow any executable instance based upon your interpreter to create a new executable instance, but encapsulating newly-supplied source code. In this sense your interpreter could go viral. (In tcl/tk context, freeWrap might be an example for study.)

Relatedly, I don't know whether, given argv[0] and your targets, one can at least copy the named file, even if one cannot open it directly for reading.


The first arg to your program is a path to the binary that's being executed. No /proc required.


It's usually the path to the binary being executed, but you can pass anything you want when you exec. e.g. execl("/bin/ls", "definitely not /bin/ls", NULL);




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: