It's always been shocking to me that the way people run CI/CD is just listing a random repository on GitHub. I know they're auditable and you pin versions, but it's crazy to me that the recommended way to ssh to a server is to just give a random package from a random GitHub user your ssh keys, for example.
This is especially problematic with the rise of LLMs, I think. It's the kind of common task which is annoying enough, unique enough, and important enough that I'm sure there are a ton of GitHub actions that are generated from "I need to build and deploy this project from GitHub actions to production". I know, and do, know to manually run important things in actions related to ssh, keys, etc., but not everyone does.
Almost everyone will just copy paste this snippet and call it a day. Most people don't think twice that v4 is a movable target that can be compromised.
In case of npm/yarn deps, one would often do the same, and copy paste `yarn install foobar`, but then when installing, npm/yarn would create a lockfile and pin the version. Whereas there's no "installer" CLI for GH actions that would pin the version for you, you just copy-paste and git push.
To make things better, ideally, the owners of actions would update the workflows which release a new version of the GH action, to make it update README snippet with the sha256 of the most recent release, so that it looks like
I mean, I think there's a difference between trusting GitHub and trusting third parties. If I can't trust GitHub, then there's absolutely no point in hosting on GitHub or trusting anything in GitHub Actions to begin with.
But yes I do think using tags is problematic. I think for one, GitHub should ban re-tagging. I can't think of a good reason for a maintainer to re-publish a tag to another commit without malicious intent. Otherwise they should provide a syntax to pin to both a tag and a commit, something like this:
`uses: actions/checkout@v4.5.6@abcdef9876543210`
The action should only work if both conditions are satisfied. This way you can still gain semantics version info (so things like dependabot can work to notify an update) but the commit is still pinned.
---
I do have to say though, these are all just band-aids on top of the actual issue. If you are actually using a dependency that is compromised, someone is going to get screwed. Are you really going to read through the commit and the source code to scan for suspicious stuff? I guess if someone else got screwed before you did they may report it, but it's still fundamentally an issue here. The simple answer is "don't use untrustworthy repositories" but that is hard to guarantee. Only real solution is to use as few dependencies as possible.
after this incident, I started pinning all my github workflows with hashes, like other folks here I guess :D
But I quickly got tired of doing it manually so I put together this [0] quick and dirty script to handle it for me. It just updates all workflow files in a repo and can be also used as a pre-commit hook to catch any unpinned steps in the future. It’s nothing fancy (leveraging ls-remote), but it’s saved me some time, so I figured I’d share in case it helps someone else :)
You would still be exposed if you had renovate or dependabot make a PR where they update the hash for you, though. Here's a PR we got automatically created the other day:
I don't think you should ever allow dependabot to make direct commits to the repository. The only sane setting (IMO) is that dependabot should just make PRs, and a human needs to verify that and hit merge. My personal opinion is for any serious repositories, allowing a robot to have commit access is often a bad time and ticking time bomb (security-wise).
Now, of course, if there are literally hundreds of dependencies to update every week, then a human isn't really going to go through each and make sure they look good, so that person just becomes a rubber-stamper, which doesn't help the situation. At that point the team should probably seriously evaluate if their tech stack is just utterly broken if they have that many dependencies.
Even if you don't automerge, the bots will often have elevated rights (it needs to be able to see your private repository, for instance), so it making a PR will run your build jobs, possibly with the updated version, and just by doing that expose your secrets even without committing to main.
Aren't GitHub action "packages" designate by a single major version? Something like checkout@v4, for example. I thought that that designated a single release as v4 which will not be updated?
I'm quite possibly wrong, since I try to avoid them as much as I can, but I mean.. wow I hope I'm not.
The crazier part is, people typically don't even pin versions! It's possible to list a commit hash, but usually people just use a tag or branch name, and those can easily be changed (and often are, e.g. `v3` being updated from `v3.5.1` to `v3.5.2`).
Fuck. Insecure defaults again. I argue that a version specifier should be only a hash. Nothing else is acceptable. Forget semantic versions. (Have some other method for determining upgrade compatibility you do out of band. You need to security audit every upgrade anyway). Process: old hash, new hash, diff code, security audit, compatibility audit (semver can be metadata), run tests, upgrade to new hash.
You and someone else pointed this out. I only use GitHub-org actions, and I just thought that surely there would be a "one version to rule them all" type rule.. how else can you audit things?
I've never seen anything recommending specifying a specific commit hash or anything for GitHub actions. It's always just v1, v2, etc.
Having to use actions for ssh/rsync always rubbed me the wrong way. I’ve recently taken the time to remove those in favor of using the commands directly (which is fairly straightforward, but a bit awkward).
I think it’s a failure of GitHub Actions that these third party actions are so widespread. If you search “GitHub actions how to ssh” the first result should be a page in the official documentation, instead you’ll find tens of examples using third party actions.
So much this. I recently looked into using GitHub Actions but ended up using GitLab instead since it had official tools and good docs for my needs. My needs are simple. Even just little scripting would be better than having to use and audit some 3rd party repo with a lot more code and deps.
And if you're new, and the repo aptly named, you may not realize that the action is just some random repo
> It's always been shocking to me that the way people run CI/CD is just listing a random repository on GitHub.
Right? My mental model of CI has always been "an automated sequence of commands in a well-defined environment". More or less an orchestrated series of bash scripts with extra sugar for reproducibility and parallelism.
Turning these into a web of vendor-locked, black-box "Actions" that someone else controls ... I dunno, it feels like a very pale imitation of the actual value of CI
This is especially problematic with the rise of LLMs, I think. It's the kind of common task which is annoying enough, unique enough, and important enough that I'm sure there are a ton of GitHub actions that are generated from "I need to build and deploy this project from GitHub actions to production". I know, and do, know to manually run important things in actions related to ssh, keys, etc., but not everyone does.