I like the theory that actually, it wasn’t luck but was picked up on by detection tools of a large entity (Google / Microsoft / NSA / whatever), and they’re just presenting the story like this to keep their detection methods a secret. It’s what I would do.
I doubt that if Google detected it with some internal tool, they'd reach out to Microsoft to hide their contribution.
It was reported by an MS engineer who happens to be involved in another OSS project. MS is doing business with the US intelligence community, for example there is the Skype story: First, rumors that NSA offers a lot of money for people who can break Skype's E2E encryption, then MS buys Skype, then MS changes Skype's client to not be E2E encrypted any more and to use MS servers instead of peer to peer, allowing undetectable wiretapping of arbitrary connections.
But it's a quite credible story too that it was just a random discovery. Even if it was the NSA, why would they hide that capability. It doesn't take much to run a script to compare git state with uploaded source tarballs in distros like Debian (Debian has separate tarballs for the source and the source with Debian patches applied).
> It was reported by an MS engineer who happens to be involved in another OSS project.
I view it as being OSS or postgresql dev that happens to work at microsoft. I've been doing the former for much longer (starting somewhere between 2005 and 2008, depending on how you count) than the latter (2019-12).
Thanks for the explanation. Also thanks for catching this and protecting us all; I think in the end it's way more believable that you indeed found it on your own, above was just brainless blathering into the ether =). Lastly, thanks for your Postgres contributions.
Intelligence agencies are very careful about sharing their findings, even with "friends", because the findings will disclose some information about their capabilities and possibly methods.
Let's say agency A has some scanning capability on open source software that detected this backdoor attempt by agency B. If they had gone public, agency B now knows they have this ability. So agency B will adjust their ways the next time and the scanning capability becomes less useful. While if agency A had told Microsoft to "find" this by accident, nobody would know about their scanning capability. And the next attempt by agency B would only try to avoid having the performance impact this first attempt had, probably leaving it visible to agency A.
Googling for "site:nsa.gov filetype:pdf" gives tens of thousands of results with documents produced by the NSA about various things (the google counter is known to lie but that's not my point). They do publish things.
He is also one of the sharpest developers I have had the fortune to speak to (and do some minor work with on the mailing list) and he knows a lot about performance and profiling. I 100% think that he found it on his own. He would also be an odd choice for Microsoft to pick since I doubt he works anywhere close to any of their security teams (unless they went through some serious lengths to find just the guy where it would be 100% believable that he just stumbled on it).
Yeah definitely, good point. I got that wrong above: He is in fact a postgres contributor who happens to be MS employee and not an MS employee who happens to be a postgres contributor.
The attacker changed the projects contact details at oss fuzz (an automated detection tool). There’s an interesting discussion as to whether that would have picked up the vulnerability https://github.com/google/oss-fuzz/issues/11760
I don't think it's plausible OSS-Fuzz could have found this. The backdoor required a build configuration that was not used in OSS-Fuzz.
I'm guessing "Jia Tan" knew this and made changes to XZ's use of OSS-Fuzz for the purposes of cementing their position as the new maintainer of XZ, rather than out of worry OSS-Fuzz would find the backdoor as people have speculated.
How many oss-fuzz packages have a Dockerfile that runs apt-get install liblzma-dev first?
Had this not been discovered, the backdoored version of xz could have eventually ended up in the ubuntu version oss-fuzz uses for its docker image - and linked into all those packages being tested as well.
Except now there's an explanation if fuzzing starts to fail - honggfuzz uses -fsanitize which is incompatible with xz's use of ifunc, so any package that depends on it should rebuild xz from source with --disable-ifunc instead of using the binary package.
This is interesting, but do you think this would have aroused enough suspicion to find the backdoor (after every Ubuntu user was owned by it)? I don't see why this is the case. It wasn't a secret that ifuncs were being used in XZ.
And if that's the case, it was sloppy of "Jia" to disable it in OSS-Fuzz and not do this:
to the XZ source code to fix the false positive and turn off the compilation warning, no attention would have been drawn to this at all since no one would have to change their build script.
With or without this PR, it's very unlikely OSS-Fuzz would have found the bug. OSS-Fuzz also happens to be on Ubuntu 20. I'm not very familiar with Ubuntu release cycles, but I think it would have been a very long time before backdoored packages made their way into Ubuntu 20.
I mean, yes. I’ve read all the commentary I can get my hands on about this incident because it is fascinating and this is the first instance of some parallel construction theory of finding it I’ve seen.
Second, maybe a routine dependency review is how they _actually_ found it but they don’t want future people like this focusing too much on that otherwise they may try to mitigate, whereas now they may focus on something inane like a 0.5 second increase in sshd’s load time or whatever.
Sorry for that ugly comparison, but that explanation reminds me of the theories when covid started, that it was created by secret organization that is actually ruling the world.
People love when there's some explanation that doesn't involve randomness, because with randomness looks like we don't have grasp on things.
Google actually had tooling that was detecting it, but he disabled check that would show it.
Google/Microsoft/NSA could just say they detected it with internal tooling and not disclose how exactly. Google and Microsoft would love to have credit.
While we are speculating, there was this case of university students trying to introduce malicious commits in the kernel in order to test open source to such attack vectors. Perhaps this was similar "research" by some students.
It's really interesting to think what might've happened if they could've implemented this with much less performance overhead. How long might it have lasted for? Years?