Scale and complexity. Floppies were dying. And it might be that so was some equipment for making them. Tape is steady state demand with likely small but clear margin and demand also in future. So quite different future and production scenario.
I imagine the difference is that Scotch tape is much better aligned with the R&D that goes into their other products in the adhesives and films categories compared to magnetic tape. If they didn't intend to go further into developing products in the computing space then selling it while it still had value was probably a wise call.
There are two main reasons, one bad, one good(-ish):
(1) Traditional autoconf assumed you only had a shell and cc and make, and so the ./configure script which is a huge ball of obscure shell commands is shipped. I think most distros will now delete and recreate these files, which probably should have happened a lot earlier. (Debian has been mostly doing this right for a long time already.)
(2) Some programs generate a lot of code (eg. in libnbd we generate thousands of lines of boilerplate C from API descriptions*). To avoid people needing to install the specific tools that we use to generate that code, we distribute the generated files in the tarball, but it's not present in git. You can still build from git directly, and you can also verify the generated code exactly matches the tarball, but both cases mean extra build dependencies for end users and packagers.
* Generating boilerplate code is a good thing in general as it reduces systematic errors, which are a vastly more common source of bugs compared to highly targeted supply chain attacks.
I advocate for checking in the auto-generated code. You can see the differences between the tool runs, can see how changes in tooling affect the generated code, can see what might have caused a regression (hey it happens).
Sometimes tooling can generate unstable files, I recall there was time when Eclipse was notorious there, for example when saving XML files they liked to reorder all the attributes. But these are bugs that need to be fixed. Tooling should generate perfectly reproducible files.
Probably depends on the project as to whether this is feasible, but for us we intentionally want to generate everything we can in order to reduce systematic errors.
They might try - that's why it's important if you're generating + committing generated code that you also have a CI step that runs before merging anything which ensures that the generated code is up-to-date and rejects any change request where generated code is out of date.
Mostly this helps with people simply forgetting to re-run the generator in their PR but it's a useful defence against people trying to smuggle things into the generated files, too!
Yeah, I guess my general thought is that anything which encourages hiding files is actively risky unless you have some kind of robust validation process. As an example, I was wondering how many people would notice an extra property in a typically gigantic NPM lock file as long as it didn’t break any of the NPM functions.
I disagree - you should ensure your dependencies are clearly listed. Docker excels at this - it's a host platform independent way of giving you a text based representation of an environment.
Docker is a Linux thing, and very much not host-platform independent. It's just "chroot on steroids", and you're essentially just shipping a bunch of Linux binaries in a .tar.gz.
It works on other systems because they emulate or virtualize enough of a Linux system to make it work. That's all fine, but comes with serious trade-offs in terms of performance, system integration, and things like that. A fair trade-off, but absolutely not host-platform independent.
There is a minor note that technically there is a weak guarantee that checksums won't break after server update and recompression with different version / alternative implementation of gzip.
This was not GitHub’s fault, but Git itself combined with cache pruning. Specifically, GitHub updating to Git 2.38 which changed the algorithm. Non-cached tarballs were regenerated on demand, and all hell broke loose: https://github.blog/2023-02-21-update-on-the-future-stabilit...
github have (for the moment) backed down and currently the auto-generated tarballs are stable, but they have in the past and may in the future change this.
Thank you for highlighting this. I've started a new discussion https://github.com/orgs/community/discussions/116557 to provide strong guarantees for checksum stability for autogenerated tarballs attached to releases.
For many projects, the release tarballs only contain the files necessary to build the software, and not the following items that may be present in the same repository:
- scripts used by project developers to import translations from another system and commit those translation changes to the repository
- scripts to build release tarballs and sign them
- continuous integration scripts
- configuration and scripts used to setup a developer IDE environment
You can use .gitattributes export-ignore to influence what gets into the tarballs and what stays into the repository! It's super powerful but not often used
And export-subst to insert the current tag or git revision into the archive too.
In fact export-subst is powerful enough that there is probably some way to create an exploit triggered by a particular payload inside a commit or tag message? :)
Maybe not triggered, but it could be part of the chain.
I smell a new backdooring opportunity. Modifying .gitattributes to surreptitiously sneak some binary files into the GitHub release tarballs. Few poeple would take a look at .gitattributes.
As this backdoor has shown, extra unnecessary files in the source files can make it easier to hide malicious code. If you take Gentoo as an example, when a software package is built, Gentoo creates a sandboxed environment first, disallowing the build process from impacting the rest of the operating system.[1] Removing superfluous files from the source tarballs minimises the ability for an attacker to get malicious code inside the sandboxed build environment.
Sandboxes for building software are commonly used throughout Linux distributions, but I am unsure how strict those sandboxes are in general e.g. whether they use seccomp and really tighten what a build script can get up to. At least on Gentoo, there is a subset of packages (such as GNU coreutils) that are always just assumed to be needed to build software and they're always present in the sandbox. Build dependencies aren't as granular as "this build needs to use awk but not sed".
Inserting a change like this as a one off would cause lots of scrutiny, which would probably get it detected. Instead, the bad actor spent years contributing to the project before dropping this.
So, while writing the exploit might be a couple of hours work, actually pulling it off is quite a bit more difficult.
I don't think they were using complexity as the reason for that assumption, but instead goals. Adding security doesn't require a nation state's level of resources, but it is a more attractive feature for a nation state that wants to preserve it over time and prevent adversaries from making use of it.
This makes sense in closed source products where you'll never get to audit the source for such exploits, but little sense in open source projects where anyone can audit it.
That's to say an enterprise router or switch would likely have secured exploits put there by corporate and national security agencies, whereas open source exploits would benefit from the probable deniability.
And on the contrary, creating a vulnerability that’s not identifiable to a limited attack group provides for a bit more deniability and anonymity. It’s hard to say which is more favorable by a nation-state actor.
An interesting angle: if this was somehow observed in use instead of getting discovered without observing use, there would be a glaring bright cui bono associated with what it was being used for.
Theoretically, I could even imagine that public key security motivated by some "mostly benign" actor who fell in love with the question "could I pull this off?" dreading the scenario of their future hacker superpower falling in the wrong hands. It's not entirely unthinkable that an alternative timeline where this was never discovered would only ever see this superbackdoor getting used for harmless pranks, winning bets and the like. In this unlikely scenario, the discovery, through the state actor and crime syndicate activities undoubtedly inspired by the discovery, would paradoxically lead to a less safe computing world than the alternative timeline where the person with the key had free RCE almost everywhere.
It'll be kind of tragic if this backdoor turns out to be the developer's pet "enable remote debugging" code, and they didn't mean for it to get out into a release. ;)