I’m one of the people who helped design and build this functionality; happy to a...

what-is-trust · on April 21, 2023

(burner account just cuz I forgot the pw to my real account)

Have to say this is a REALLY misleading name. "Trusted publishers" makes it seem like PyPI added (or improved) a manual curation & vetting process to its packages.

This is NOT that. This is more like "SSO for PyPI", a totally different thing.

You've added a different authentication mechanism for publishers, which is totally different from decreasing risk for end-users through a better vetting process (which is what most people would assume given the title). It's really unfortunate that the work you put into this will both 1) mislead readers into thinking it's something different and 2) be under-appreciated because they were confused.

I wonder if this is a case where the HN mods SHOULD edit the article title...?

paulddraper · on April 21, 2023

It's up to PyPI what to call it, but I would have said OIDC publishers.

vintagedave · on April 20, 2023

I interpreted the headline as protection against malicious packages. Having read the article, and being clearly not familiar enough with publishing on PyPi, I actually have not much idea what it is: maybe a different way to authenticate when publishing your package?

Would you mind summarising in layman's terms please? And does this have any relevance to package manager trust and security?

woodruffw · on April 20, 2023

Sure, I'm happy to!

Trusted publishers are a mechanism for automatically publishing packages to PyPI, without manually maintaining or configuring any credentials on your local system or CI. They work by building on top of OpenID Connect[1], as mentioned in the post: supported ecosystems (like GitHub Actions) present an identity token to PyPI that can then be exchanged for a short-lived publishing token.

The relevance for package trust: trusted publishing creates a strong relationship between a machine identity (the OIDC identity token) and a package published to PyPI, with the former in turn containing a strong binding to a source code repository's state (slug, `git` ref, etc.). When using trusted publishing, you have proof that the only machine, repository state, CI configuration, etc. being used to produce the package is the one you intended.

The relevance for security: trusted publishing eliminates the need to configure and manage long-lived project- or used-scoped PyPI tokens by replacing them with short-lived tokens. This reduces the "blast radius" of CI or developer machine compromise. Trusted publishers also allow for finer-grained publishing controls: they can be restricted to individual GitHub Actions environments, which in turn can be limited to specific subsets of users on a GitHub repo (e.g. one set for beta publishing, and another set for release publishing).

[1]: https://openid.net/connect/

moeris · on April 21, 2023

It seems like "artifact provenance" or something would have been a better term. Is this related to SLSA?

woodruffw · on April 22, 2023

It’s not directly related to SLSA, although SLSA is an adjacent effort to improve package security!

I think provenance would be misleading in this context, since it’s mostly a side effect of the intended behavior (i.e., publishing without needing to manually configure a shared credential).

morkalork · on April 20, 2023

When I first read the title I was hoping for something like "we have added a layer of curation and verification to pypi in response to malicious packages being published". Oh well, one can dream.

woodruffw · on April 20, 2023

PyPI is a "dumb" index, in the sense that it doesn't really offer opinions on which library is best for a particular task. IMO (and my opinion doesn't actually matter here) this is the right approach for the overall OSS community: curation is a hard (and mostly manual) task, in contrast to most development on PyPI (which is aimed at decreasing (or at least sustaining) maintainer burden).

That being said, there are commercial offerings for these sorts of things[1].

[1]: https://cloud.google.com/assured-open-source-software

greysteil · on April 20, 2023

+1 for this approach (and thanks for all your work on PyPI William!).

FWIW, I think it's worth clarifying that PyPI is already involved in malware detection and takedowns (as are almost all the package registries). The curation that commercial vendors offer is a little more nuanced than excluding known malware (for example, allowing users to restrict their downloads to a "known good" set of packages, rather than "only" excluding "known bad" ones).

https://warehouse.pypa.io/development/malware-checks.html

woodruffw · on April 20, 2023

The PyPI admins (including Dustin, who wrote this post) do way more work than me, much of which is on a volunteer basis. They deserve way more credit than I do for PyPI; I'm just the lowly contractor on a few security features :-)

And yes, that's an important distinction to make! PyPI does indeed "curate" in the sense that its policies include spam and malware removal, and a great deal of automated and manual triage work goes into that.

notatoad · on April 21, 2023

> That being said, there are commercial offerings for these sorts of things[1].

I had no idea this was a thing, it looks super useful. Anybody familiar with any similar offerings from non-google companies? Or is anybody doing it for npm packages?

Riverheart · on April 21, 2023

Then outsource the curation to someone else by letting people create a feed of trusted packages that pip can be configured for or something. Then we can point pip at pip.name.com/packages.json instead of having to host an artifactory for simple use cases.

woodruffw · on April 21, 2023

This is something you can already do: you can host whatever curated package view you'd like using the Simple Repository API defined in PEP 503[1]. That PEP doesn't require that files be hosted on the same origin as the index (which isn't the case for PyPI either), so you could perform the curation there by re-using PyPI's CDN URLs.

[1]: https://peps.python.org/pep-0503/

morkalork · on April 20, 2023

Thanks for the reference! Being supplied through GCP is very applicable for me.

mhils · on April 20, 2023

Fantastic work, kudos! OIDC auth is so much nicer compared to any ad-hoc secrets management. Thank you for dealing with JWT for us. :)

As a small suggestion, it may make sense to move the "Create a token for ..." button to the new publishing page on PyPI? This way both options would be next to each other. I went straight to the settings page after reading your blog post, and was initially confused to only find the old token option there. Having both at the same place would maybe be more straightforward.

woodruffw · on April 20, 2023

Thank you for the kind words, and thanks for pointing this out -- I agree that we improve the buttons and forms here!

There's a little bit of complexity around the underlying data model (since publishers correspond to projects, while even project-scoped tokens are fundamentally bound to users), but at minimum we could certainly add some language or a link nudging users towards "trusted publishers" next to the current token creation button. I'll file an issue for that tonight.

ollien · on April 21, 2023

I'm relatively uneducated here, but doesn't OIDC still require some kind of secret to be posessed? What's the upside you're excited about?

Aeolun · on April 21, 2023

Yeah, but the secret in question is possessed by Github, not you or your source.

PyPi will be able to verify that the id-token was signed with the Github secret, and therefore trust that the person described in the token is who they say they are.

vamega · on April 21, 2023

I know I’m kind of late to the post, but maybe you or someone else will still see this.

Would a scheme like this only work for well known providers (GitHub is done, others could be in the pipeline according to the article)? Does a new hosting provider need to approach PyPi/PyPa to integrate with the hosting provider, or is it possible for any hosting provider to implement a set of APIs and more or less transparently be able to support becoming a Trusted Publisher.

I think it’s the former, since it seems strange that anyone can be a trusted publisher, but that does make it much harder for smaller CI providers to onboard. A self hosted CI platform doesn’t seem like it could be “trusted”. Is that accurate?

In case it’s not obvious - I’m not super well versed in the details of OIDC/OAuth.

woodruffw · on April 22, 2023

Thanks for posting your question!

Your understanding is accurate: the security advantages of using an OIDC provider stem primarily from them being big and from being closely tied to the ecosystem that’s hosting the associated code (like GitHub). For personal source code hosts, the threat model for a self-hosted IdP wouldn’t be much better than just using API tokens directly.

And yes, re: additional providers: each needs to be individually implemented by PyPI, since each has slightly different claim sets and trust conditions. It would be nice if there was something standard we could do here, but the underlying data model between different CI providers is too different.

adobrawy · on April 22, 2023

Current Pypi implementation requires custom onboarding of each new OIDC provider on Pypi server, so you are completely right. However, OIDC protocol (and JWT token used by it) is pretty flexible, so it should not be challenging to add new publishers (it's mostly about mapping token field to new configuration form).

HyperSane · on April 21, 2023

It is a good idea with and incredibly misleading name.

cookiengineer · on April 20, 2023

Why is OpenID used instead of, for example, the already git-integrated gpg signed commits?

Wouldn't a list of gpg keys be more decentralized from third party auth providers?

woodruffw · on April 20, 2023

On a general level: PGP is a very difficult ecosystem to integrate correctly and safely[1]. It's also only one of several signing schemes supported by `git`, and arguably the worst of them (in terms of excessive complexity, poor defaults, and format/cipher agility).

More specifically to trusted publishing: the idea is to bind a PyPI project to its source repository (or repositories) for CI-based releases. PGP signatures wouldn't really accomplish this; it'd be closer to using PGP as an authentication system. And at that point you're just doing API tokens but with more steps and a shakier foundation (per above), which PyPI already supports.

An identity provider is a fundamental assumption in trusted publishing; API tokens will continue to work as a "decentralized" alternative if using an IdP is unacceptable for your particular use case.

[1]: https://latacora.micro.blog/2019/07/16/the-pgp-problem.html

benatkin · on April 21, 2023

When you sign commits, the signature includes the parent commit. So it includes the version, and you can use it to verify something before publishing. This way the API keys could be included in CI much more safely.

You would need to make it so that there's a setting in an account so an API key is useless to publish without a signed commit.

It's simple, open, and works well.

Edit: if you can't publish the same version twice it enables someone with a stolen API key to publish something after it's bumped but before the CI runs. That's a far fetched scenario.

woodruffw · on April 21, 2023

> You would need to make it so that there's a setting in an account so an API key is useless to publish without a signed commit.

That’s the “easy” part. The hard part is determining what constitutes a valid PGP identity, as well as inheriting all of the baggage that comes with PGP (including things that nobody wants to deal with, like revocation).

And again: this would be explicitly forcing users into a known bad signing mechanism, one that only applies to git. Trusted publishing as-implemented does not have these problems.

benatkin · on April 21, 2023

It's not bad. Just because you reference some post doesn't make it a fact and not an opinion:

"The answer is that they shouldn’t be telling you that, because PGP is bad and needs to go away."

- says some micro.blog

GPG is very much current technology. The code forges provide APIs for public keys, so some of the difficulties are handled already. https://docs.gitlab.com/ee/api/users.html#list-all-gpg-keys-...

woodruffw · on April 21, 2023

It’s a pretty well known blog, from a pretty well known security company.

I would also go as far as to say that “PGP is bad and should not be used for greenfield projects” is not a remotely controversial opinion in applied cryptography circles. Likewise, it is not controversial in those circles to assert that PGP is more or less the opposite of current technology for digital signatures.

Some more helpful links by generally recognized authorities[1][2]. You’ll note that each of these is more than a few years old at this point; PGP’s deficiencies are very well trodden.

[1]: https://blog.cryptographyengineering.com/2014/08/13/whats-ma...

[2]: https://words.filippo.io/giving-up-on-long-term-pgp/

benatkin · on April 21, 2023

Well I don't buy into this cancelling of open tech.

woodruffw · on April 21, 2023

There is no meaningful sense in which PGP is “canceled”, except in the kind of shitpost sense I would use in a talk slide.

Open alternatives exist, are better, and have been better for well over a decade at this point. No significant risk is posed to “open tech” by doing things better than PGP can possibly offer us.

benatkin · on April 21, 2023

If you're bluntly calling it "bad" and I take what you said at face value, that sort of cancels it for me.

But yeah, it would continue to exist. I didn't mean to suggest it wouldn't.

PGP remains on the table for me. A Microsoft-only launch doesn't.

woodruffw · on April 21, 2023

There is a strict sense in which PGP does not remain on the table for you, because PyPI does not (and will not, on this developer’s clean conscience) ever support PGP for authentication :-)

As said in adjacent threads, PyPI intends to add support for other OIDC providers once they give us the claims we need. Whether or not you choose to use it is ultimately up to you; normal API tokens will continue to work.

benatkin · on April 21, 2023

It still is, because I can have my CI use it to determine whether to publish a PyPI package.

Nice try though.

This is definitely a new low for Python. The main issue is just that it was a Microsoft-only launch. All the rest is just window dressing.

It's a good reminder about the Pylance situation in Visual Studio Code. Microsoft tricked people into using closed source. https://ghuntley.com/fracture/

marcosdumay · on April 21, 2023

So, a couple of blog posts about standard email encryption, with one pushing for a worse in almost every way proprietary replacement?

None of that even applies to this context.

benatkin · on April 21, 2023

Plus "giving up on long term PGP" doesn't really apply here. You can add and remove GPG keys on GitLab every day if you like.

I have respect for those who still have a private key to go with a public key they created 10+ years ago. I don't, except maybe on the encrypted hard drive of a dead laptop on which I haven't gotten around to doing data recovery.

aborsy · on April 21, 2023

It feels like that these two blog posts (and the one by lacorta) are the only ones that anti-PGP folks on HN could find on internet. There are far more tutorials on the use of PGP than on its problems (mostly around email encryption, which isn’t relevant here).

Decentralized trust is a very good idea. PGP provides useful functionalities around that. Keybase was a good project, but sadly was acquired and has since stopped.

The alternatives proposed are great in narrow use cases, but aren’t really replacements.

woodruffw · on April 21, 2023

> It feels like that these two blog posts (and the one by lacorta) are the only ones that anti-PGP folks on HN could find on internet. There are far more tutorials on the use of PGP than on its problems

They're the ones that come up because (1) they're good, (2) they're increasingly "old" (indicating that these problems are not newly identified), and (3) they're reputable sources.

Besides, technical volume doesn't mean anything (and certainly doesn't imply quality): there are innumerable copies of the Anarchist's Cookbook on the Internet, and the sheer number of volumes doesn't make their contents any less likely to blow your hand off.

The problems identified are not unique to email encryption; email encryption stands out as a punching bag for PGP's failures because of how consistently PGP fails to provide meaningful security while the rest of the world has moved on. Notably, all of the problems related to PGP signatures in emails are shared by codesigning with PGP.

> Decentralized trust is a very good idea. PGP provides useful functionalities around that. Keybase was a good project, but sadly was acquired and has since stopped.

This hasn't been true for years (PGP's strong set and web of trust are dead, in thanks part to poor format design that enabled trivial resource attacks on keyservers. And the second part contradicts the first: the thing that made Keybase useful was that it centralized and made (mostly) work a bunch of things that don't work in "bare" PGP (such as actual proofs of identity/account possession).

burnished · on April 21, 2023

https://github.com/golang/go/issues/44226

If you're just looking for signs of consensus that issue describes why the Go pgp package is deprecated - it is very critical of pgp. Interesting read too.

0xbadcafebee · on April 20, 2023

Decentralization always makes things more complicated, but not always better.