This is a great feature, but its name is going to confuse people.
Before I read the article, I thought it was going to be about some kind of additional verification process to check that publishers are not malicious and have adequate security practices, resulting in two tiers of publishers, "trusted" and "untrusted". You might then configure pip to only install packages from "trusted publishers" by default, and you have to go through some scary confirmation prompts to install packages from untrusted ones.
Then I read the article, and I realise it wasn't what I was expecting at all. If you'd called it "OIDC Authenticated Publishers", I would have known what it was about from the start, and wouldn't have walked into the topic with the wrong expectation
Indeed, not the publishers are trusted but rather their authorisation of a delegated publishing action taking or haven taken place in their name.
The name is inaccurate and even misleading in a way that actively undermines the purpose of the entire initiative itself. A mistake which I believe only a name change can and should fix.
That also strikes bad since GitHub Actions isn't a publisher anyways. To me, a package "publisher" is basically just the author. Here it's a CI system or a builder. "Trusted builders" or "trusted pushers" would've made a lot more sense, since the trust is applying to the build system, not the publisher.
> To me, a package "publisher" is basically just the author.
At work, we keep on calling the CI stage which uploads the build artifacts to Artifactory the "publish" stage. Maybe that's the wrong terminology but I've got used to it. And if "publish" is what the stage is doing, it kind of makes sense to call it a "publisher". Maybe that's wrong, especially in the context of PyPI, but it doesn't sound wrong to me.
i think there is some context here that pypi had some blowback 9 months ago on their vetting process: https://news.ycombinator.com/item?id=32037562 so in a sense this is a less arbitrary version of that trusted publisher process.
(burner account just cuz I forgot the pw to my real account)
Have to say this is a REALLY misleading name. "Trusted publishers" makes it seem like PyPI added (or improved) a manual curation & vetting process to its packages.
This is NOT that. This is more like "SSO for PyPI", a totally different thing.
You've added a different authentication mechanism for publishers, which is totally different from decreasing risk for end-users through a better vetting process (which is what most people would assume given the title). It's really unfortunate that the work you put into this will both 1) mislead readers into thinking it's something different and 2) be under-appreciated because they were confused.
I wonder if this is a case where the HN mods SHOULD edit the article title...?
I interpreted the headline as protection against malicious packages. Having read the article, and being clearly not familiar enough with publishing on PyPi, I actually have not much idea what it is: maybe a different way to authenticate when publishing your package?
Would you mind summarising in layman's terms please? And does this have any relevance to package manager trust and security?
Trusted publishers are a mechanism for automatically publishing packages to PyPI, without manually maintaining or configuring any credentials on your local system or CI. They work by building on top of OpenID Connect[1], as mentioned in the post: supported ecosystems (like GitHub Actions) present an identity token to PyPI that can then be exchanged for a short-lived publishing token.
The relevance for package trust: trusted publishing creates a strong relationship between a machine identity (the OIDC identity token) and a package published to PyPI, with the former in turn containing a strong binding to a source code repository's state (slug, `git` ref, etc.). When using trusted publishing, you have proof that the only machine, repository state, CI configuration, etc. being used to produce the package is the one you intended.
The relevance for security: trusted publishing eliminates the need to configure and manage long-lived project- or used-scoped PyPI tokens by replacing them with short-lived tokens. This reduces the "blast radius" of CI or developer machine compromise. Trusted publishers also allow for finer-grained publishing controls: they can be restricted to individual GitHub Actions environments, which in turn can be limited to specific subsets of users on a GitHub repo (e.g. one set for beta publishing, and another set for release publishing).
It’s not directly related to SLSA, although SLSA is an adjacent effort to improve package security!
I think provenance would be misleading in this context, since it’s mostly a side effect of the intended behavior (i.e., publishing without needing to manually configure a shared credential).
When I first read the title I was hoping for something like "we have added a layer of curation and verification to pypi in response to malicious packages being published". Oh well, one can dream.
PyPI is a "dumb" index, in the sense that it doesn't really offer opinions on which library is best for a particular task. IMO (and my opinion doesn't actually matter here) this is the right approach for the overall OSS community: curation is a hard (and mostly manual) task, in contrast to most development on PyPI (which is aimed at decreasing (or at least sustaining) maintainer burden).
That being said, there are commercial offerings for these sorts of things[1].
+1 for this approach (and thanks for all your work on PyPI William!).
FWIW, I think it's worth clarifying that PyPI is already involved in malware detection and takedowns (as are almost all the package registries). The curation that commercial vendors offer is a little more nuanced than excluding known malware (for example, allowing users to restrict their downloads to a "known good" set of packages, rather than "only" excluding "known bad" ones).
The PyPI admins (including Dustin, who wrote this post) do way more work than me, much of which is on a volunteer basis. They deserve way more credit than I do for PyPI; I'm just the lowly contractor on a few security features :-)
And yes, that's an important distinction to make! PyPI does indeed "curate" in the sense that its policies include spam and malware removal, and a great deal of automated and manual triage work goes into that.
> That being said, there are commercial offerings for these sorts of things[1].
I had no idea this was a thing, it looks super useful. Anybody familiar with any similar offerings from non-google companies? Or is anybody doing it for npm packages?
Then outsource the curation to someone else by letting people create a feed of trusted packages that pip can be configured for or something. Then we can point pip at pip.name.com/packages.json instead of having to host an artifactory for simple use cases.
This is something you can already do: you can host whatever curated package view you'd like using the Simple Repository API defined in PEP 503[1]. That PEP doesn't require that files be hosted on the same origin as the index (which isn't the case for PyPI either), so you could perform the curation there by re-using PyPI's CDN URLs.
Fantastic work, kudos! OIDC auth is so much nicer compared to any ad-hoc secrets management. Thank you for dealing with JWT for us. :)
As a small suggestion, it may make sense to move the "Create a token for ..." button to the new publishing page on PyPI? This way both options would be next to each other. I went straight to the settings page after reading your blog post, and was initially confused to only find the old token option there. Having both at the same place would maybe be more straightforward.
Thank you for the kind words, and thanks for pointing this out -- I agree that we improve the buttons and forms here!
There's a little bit of complexity around the underlying data model (since publishers correspond to projects, while even project-scoped tokens are fundamentally bound to users), but at minimum we could certainly add some language or a link nudging users towards "trusted publishers" next to the current token creation button. I'll file an issue for that tonight.
Yeah, but the secret in question is possessed by Github, not you or your source.
PyPi will be able to verify that the id-token was signed with the Github secret, and therefore trust that the person described in the token is who they say they are.
I know I’m kind of late to the post, but maybe you or someone else will still see this.
Would a scheme like this only work for well known providers (GitHub is done, others could be in the pipeline according to the article)? Does a new hosting provider need to approach PyPi/PyPa to integrate with the hosting provider, or is it possible for any hosting provider to implement a set of APIs and more or less transparently be able to support becoming a Trusted Publisher.
I think it’s the former, since it seems strange that anyone can be a trusted publisher, but that does make it much harder for smaller CI providers to onboard. A self hosted CI platform doesn’t seem like it could be “trusted”. Is that accurate?
In case it’s not obvious - I’m not super well versed in the details of OIDC/OAuth.
Your understanding is accurate: the security advantages of using an OIDC provider stem primarily from them being big and from being closely tied to the ecosystem that’s hosting the associated code (like GitHub). For personal source code hosts, the threat model for a self-hosted IdP wouldn’t be much better than just using API tokens directly.
And yes, re: additional providers: each needs to be individually implemented by PyPI, since each has slightly different claim sets and trust conditions. It would be nice if there was something standard we could do here, but the underlying data model between different CI providers is too different.
Current Pypi implementation requires custom onboarding of each new OIDC provider on Pypi server, so you are completely right. However, OIDC protocol (and JWT token used by it) is pretty flexible, so it should not be challenging to add new publishers (it's mostly about mapping token field to new configuration form).
On a general level: PGP is a very difficult ecosystem to integrate correctly and safely[1]. It's also only one of several signing schemes supported by `git`, and arguably the worst of them (in terms of excessive complexity, poor defaults, and format/cipher agility).
More specifically to trusted publishing: the idea is to bind a PyPI project to its source repository (or repositories) for CI-based releases. PGP signatures wouldn't really accomplish this; it'd be closer to using PGP as an authentication system. And at that point you're just doing API tokens but with more steps and a shakier foundation (per above), which PyPI already supports.
An identity provider is a fundamental assumption in trusted publishing; API tokens will continue to work as a "decentralized" alternative if using an IdP is unacceptable for your particular use case.
When you sign commits, the signature includes the parent commit. So it includes the version, and you can use it to verify something before publishing. This way the API keys could be included in CI much more safely.
You would need to make it so that there's a setting in an account so an API key is useless to publish without a signed commit.
It's simple, open, and works well.
Edit: if you can't publish the same version twice it enables someone with a stolen API key to publish something after it's bumped but before the CI runs. That's a far fetched scenario.
> You would need to make it so that there's a setting in an account so an API key is useless to publish without a signed commit.
That’s the “easy” part. The hard part is determining what constitutes a valid PGP identity, as well as inheriting all of the baggage that comes with PGP (including things that nobody wants to deal with, like revocation).
And again: this would be explicitly forcing users into a known bad signing mechanism, one that only applies to git. Trusted publishing as-implemented does not have these problems.
It’s a pretty well known blog, from a pretty well known security company.
I would also go as far as to say that “PGP is bad and should not be used for greenfield projects” is not a remotely controversial opinion in applied cryptography circles. Likewise, it is not controversial in those circles to assert that PGP is more or less the opposite of current technology for digital signatures.
Some more helpful links by generally recognized authorities[1][2]. You’ll note that each of these is more than a few years old at this point; PGP’s deficiencies are very well trodden.
There is no meaningful sense in which PGP is “canceled”, except in the kind of shitpost sense I would use in a talk slide.
Open alternatives exist, are better, and have been better for well over a decade at this point. No significant risk is posed to “open tech” by doing things better than PGP can possibly offer us.
There is a strict sense in which PGP does not remain on the table for you, because PyPI does not (and will not, on this developer’s clean conscience) ever support PGP for authentication :-)
As said in adjacent threads, PyPI intends to add support for other OIDC providers once they give us the claims we need. Whether or not you choose to use it is ultimately up to you; normal API tokens will continue to work.
It still is, because I can have my CI use it to determine whether to publish a PyPI package.
Nice try though.
This is definitely a new low for Python. The main issue is just that it was a Microsoft-only launch. All the rest is just window dressing.
It's a good reminder about the Pylance situation in Visual Studio Code. Microsoft tricked people into using closed source. https://ghuntley.com/fracture/
Plus "giving up on long term PGP" doesn't really apply here. You can add and remove GPG keys on GitLab every day if you like.
I have respect for those who still have a private key to go with a public key they created 10+ years ago. I don't, except maybe on the encrypted hard drive of a dead laptop on which I haven't gotten around to doing data recovery.
It feels like that these two blog posts (and the one by lacorta) are the only ones that anti-PGP folks on HN could find on internet. There are far more tutorials on the use of PGP than on its problems (mostly around email encryption, which isn’t relevant here).
Decentralized trust is a very good idea. PGP provides useful functionalities around that. Keybase was a good project, but sadly was acquired and has since stopped.
The alternatives proposed are great in narrow use cases, but aren’t really replacements.
> It feels like that these two blog posts (and the one by lacorta) are the only ones that anti-PGP folks on HN could find on internet. There are far more tutorials on the use of PGP than on its problems
They're the ones that come up because (1) they're good, (2) they're increasingly "old" (indicating that these problems are not newly identified), and (3) they're reputable sources.
Besides, technical volume doesn't mean anything (and certainly doesn't imply quality): there are innumerable copies of the Anarchist's Cookbook on the Internet, and the sheer number of volumes doesn't make their contents any less likely to blow your hand off.
The problems identified are not unique to email encryption; email encryption stands out as a punching bag for PGP's failures because of how consistently PGP fails to provide meaningful security while the rest of the world has moved on. Notably, all of the problems related to PGP signatures in emails are shared by codesigning with PGP.
> Decentralized trust is a very good idea. PGP provides useful functionalities around that. Keybase was a good project, but sadly was acquired and has since stopped.
This hasn't been true for years (PGP's strong set and web of trust are dead, in thanks part to poor format design that enabled trivial resource attacks on keyservers. And the second part contradicts the first: the thing that made Keybase useful was that it centralized and made (mostly) work a bunch of things that don't work in "bare" PGP (such as actual proofs of identity/account possession).
If you're just looking for signs of consensus that issue describes why the Go pgp package is deprecated - it is very critical of pgp. Interesting read too.
> I understand the focus on Github, they are the biggest. But so many use Gitlab, Gitea, Bitbucket, .. Would love to see some examples for those as well
The focus is on GitHub because GitHub Actions' OpenID Connect support is the most mature of the large source code and CI hosts, and trusted publishing fundamentally relies on OIDC. GitLab has OIDC support[1] that we'd ideally integrate against, but I don't know if others do.
That process is described in the user documentation[1]: a project can have publishers added to it on PyPI's website, with each publishers' configuration specifying the necessary state for a trust relationship with a particular workflow in a particular GitHub repository.
I want to see better supply chain security but the main thing I see here is that PyPI is partnering solely with Microsoft on this and it seems like a huge bummer. As another comment mentioned, PyPI is lacking in curation and this doesn't address it.
See other replies: I think it’s inaccurate to call this a partnership or endorsement. GitHub being the initially supported publishing platform is primarily a matter of expedience and greatest impact.
OpenID Connect is not particularly obscure. I’d say it’s within a standard deviation of OAuth (since it is OAuth2), which is both widely known and widely deployed.
hi, I am trying to make it happen in my repo tschm/pyhrp. Currently I struggle to get the API token. Can you first confirm that the OICD token is only displayed as "**" in echo "OICD: {oicd_token}"
Interesting that in spite of at least one documented case of confusion and numerous comments attesting to the same there is no response of 'ok, maybe we should change the name'.
Hadnt this only been announced for something like eight hours by the time you posted this? Also I wasnt aware that a coven* of HN hecklers represented the self evident and necessary plurality required to make decisions for any project.
PyPI’s existing API tokens are probably a better fit for your use case, and already support “self” publishing (since that’s the default; everything in this blog post is about delegating trust away).
It won’t be, because “exchanging” API tokens isn’t something you’d normally do. It’s only done in the context of trusted publishing; in any other context you would continue to use the ordinary API tokens that this feature is built on top of.
The "blue badges" of software packages. Honestly not a bad idea but who and what determines a package to be "trusted"? Will there be transparency into the decisions?
Before I read the article, I thought it was going to be about some kind of additional verification process to check that publishers are not malicious and have adequate security practices, resulting in two tiers of publishers, "trusted" and "untrusted". You might then configure pip to only install packages from "trusted publishers" by default, and you have to go through some scary confirmation prompts to install packages from untrusted ones.
Then I read the article, and I realise it wasn't what I was expecting at all. If you'd called it "OIDC Authenticated Publishers", I would have known what it was about from the start, and wouldn't have walked into the topic with the wrong expectation