These were absolutely incredible when they first opened up right on until covid. The blue-apron style meal kits they had were actually really tasty and the gimmicky integration with Alexa to tell you the next step in the recipe was actually kind of useful when you were busy stirring a pot or cutting something and too busy to pull out the recipe card. It was like a 7-Eleven, but with the prices of a normal grocery store and higher quality prepared food. Not needing to deal with checkout felt freeing. I substituted many grocery store runs with a quick walk over to the original Amazon Go back in the day.
After covid, it was never the same. Open for shorter windows, closed on Sundays, reduced selection, no more meal kits etc.
I had many friends who worked on Amazon Go, so it's a bit sad to see that work come to an end.
This question is surprising to me, because I consider AI code review the single most valuable aspect of AI-assisted software development today. It's ahead of line/next-edit tab completion, agentic task completion, etc.
AI code review does not replace human review. But AI reviewers will often notice little things that a human may miss. Sometimes the things they flag are false positives, but it's still worth checking in on them. If even one logical error or edge case gets caught by an AI reviewer that would've otherwise made it to production with just human review, it's a win.
Some AI reviewers will also factor in context of related files not visible in the diff. Humans can do this, but it's time consuming, and many don't.
AI reviews are also a great place to put "lint" like rules that would be complicated to express in standard linting tools like Eslint.
We currently run 3-4 AI reviewers on our PRs. The biggest problem I run into is outdated knowledge. We've had AI reviewers leave comments based on limitations of DynamoDB or whatever that haven't been true for the last year or two. And of course it feels tedious when 3 bots all leave similar comments on the same line, but even that is useful as reinforcement of a signal.
I already use Graphite today on top of git. Others are using alternatives like Sapling, etc.
To go back to your question around why people still use these workarounds on top of git, it's because the CLI is just one piece of it. With Graphite, I also get a stack-aware merge queue and review dashboard.
This is really exciting. Step functions were a big improvement over SWF and the Flow framework, but declarative workflow authoring sucks from a type-safety standpoint. Workflows-as-code is the way to go, and that was missing from AWS. Can't wait to build on top of this.
You're probably already planning this, but please setup an alarm to fire off if a new package release is published that is not correlated with a CI/CD run.
Or require manual intervention to publish a new package. I'm not sure why we need to have a fully automated pipeline here to go from CI/CD to public package release. It seems like having some kind of manual user interaction to push a new version of a library would be a good thing.
The basic issue with manual interaction is a question of authority: a pretty common problem for companies (and open source groups) is when $EARLY_EMPLOYEE/$EARLY_CONTRIBUTOR creates and owns the entire publishing process for a key package, and then leaves without performing a proper transfer of responsibility. This essentially locks the company/group out of its own work, and increases support load on community maintained indices to essentially adjudicate rightful ownership of the package name.
(There are a variety of ways to solve this, but the one I like best is automated publishing a la Trusted Publishing with environment mediated manual signoffs. GitHub and other CI/CD providers enable this.)
I don’t buy this. I mean, I’m sympathetic to the issue. It’s easy for things to get jumbled when companies are young. But we’re talking about libraries that are used by your customers. To many, this is the externally visible interface to the company.
What you describe sounds like a process problem to me. If an $EARLY_EMPLOYEE if the only one with the deploy keys for what is a product of the company, then that’s a problem. If a deployment of that key library can be made without anyone approving it, that’s also a problem. But those are both people problems… and you can’t solve a people problem with a technical solution.
> But those are both people problems… and you can’t solve a people problem with a technical solution.
I don’t think it is a people problem in this case: the only reason there’s a person involved at all is because we’ve decided to introduce one as an intermediating party. A more misuse-resistant scheme disintermediates the human, because the human was never actually a mandatory part of the scheme.
The person who intermediates the trust relationship between the index and the source repository. There’s no reason for the credential that links those two parties to be intermediated by a human; they’re two machine services talking.
(You obviously can’t disintermediate the human from maintenance or development!)
You’re saying that whatever is in the source repository should be uploaded in the npm index, right? If the code is tagged as release, the built artifact is automatically uploaded to npm. Is that what you’re proposing?
That exactly what got PostHog into this position. The keys to publish to npm were available to an engineer or GitHub to push a malware build into npm automatically. This isn’t a technical issue… it’s a process issue. I don’t see the problem as that the keys were misused. I see the problem as that it was possible to misuse the keys at all. Why do you need that process to be automatic? How often are you pushing new updates?
I would argue that those npm assets/libraries are your work product. That is what your customer needs to use your service. It is a published product from your company. It is too important to allow a new version to be published out to the public without a human in the loop to approve it.
When you have a fully automatic publishing cycle, you’re trading security for convenience. It’s all about how much risk you’re willing to accept. For me, that’s too much of a risk to the reputation to the company. I also think the balance shifts if you’re talking about purely internal assets, having completely automatic ci/cd makes perfect sense for most companies. For me, it is about who is hurt if there is an issue (and you should expect for there to be an issue).
Putting a person in the loop for releasing a product is one way to solve this. It’s not perfect, but at the moment, I think it’s the most secure (for the public).
> You’re saying that whatever is in the source repository should be uploaded in the npm index, right? If the code is tagged as release, the built artifact is automatically uploaded to npm. Is that what you’re proposing?
No, I'm saying that the source repository should act as an authentication principal itself. A human should still initiate the release process, but the authentication process that connects the source repository (more precisely CI/CD) to the index should not involve a credential that's implicitly bound to a human identity (because the human's role within a project or company is ephemeral).
As far as I can tell, what got PostHog into this situation wasn't a fully automated release process (very few companies/groups have fully automated processes), but the fact that they had user-created long-lived credentials that an attacker could store and weaponize at a time most convenient to them. That's a problem regardless of whether there's normally a human in the loop or not, because the long-lived credential itself was sufficient for publishing.
(In other words, we basically agree about human approval being good; what I'm saying is that we should formalize human approval without making the authentication scheme explicitly require an intermediating partner who doesn't inherently represent the actual principal, i.e. the source repository.)
I think we agree more than we don’t and the rest are personal preferences and policy differences. But we largely agree in principle.
I like the idea of having a person whose job is approving releases. Kind of like a QC tag — this release was approved by XX. I saw the issue as PostHog having a credential available to the CI/CD that had the authority to push releases automatically. When a new GitHub action was added, that credential was abused to push a bad update to npm. I might be wrong, I don’t deal with npm that much.
You can't "require" manual intervention. Sure you can say that the keys stays on say 2 developers laptops, but personal devices have even more surface area for key leak than CI/CD pipeline. It wouldn't have prevented attacks like this issue in any case where the binary just searched for keys across the system.
One alternative is to do the signing on airlocked system stored in physically safe but accessible location, but I guess that's just way too much inconvenience.
This is orthogonal to the issue at hand. The problem is a malicious actor cutting a release outside of the normal release process. It doesn't matter if the normal process is automated or manual.
It could have eliminated an attack surface where they steal the credentials from the CI/CD...
...But then you if I understand NPM publishing well, you would still have the credentials on someone's computer laying around? I guess you could always revoke the tokens after publishing? It's all balancing convenience and security, with some options being bad at both?
THE SERVICE IS PROVIDED STRICTLY ON AN “AS IS” AND “AS AVAILABLE” BASIS, AND PROVIDER MAKES NO WARRANTY THAT THE SERVICE IS COMPLETE, SUITABLE FOR YOUR PURPOSE, RELIABLE, USEFUL, OR ACCURATE.
a. THE SERVICE IS PROVIDED STRICTLY ON AN “AS IS” AND “AS AVAILABLE” BASIS, AND PROVIDER MAKES NO WARRANTY THAT THE SERVICE IS COMPLETE, SUITABLE FOR YOUR PURPOSE, RELIABLE, USEFUL, OR ACCURATE.
They're storing PII in the github repo they kicked the core maintainers out of?
Just stop trying to make excuses for these people. They screwed up, and based on this press release, don't seem to have any interest in actually correcting those mistakes.
I'm not a lawyer, so maybe a silly question: is it possible the software license is different from service warranty? And I guess another thing that comes to mind is that maybe they didn't mean _legal_ warranty, but something that was used colloquially?
This is an excellent write-up of the problem. New hires out of college/bootcamps often have no awareness of the risks here at all. Sometimes even engineers with years of experience but no operational mentorship in their career.
The kitchen sink example in particular is one that trips up people. Without knowing the specifics of how a library may deal with failure edge cases, it can catch you off guard (e.g., axios errors including API key headers).
A lot of these problems come from architectures where secrets go over the wire instead of just using signatures/ids. But in cases where you have to use some third party platform, there's often no choice.
Fine for when you have no NAT gateway and have a subnet with truly no egress allowed. But if you're adding a NAT gateway, it's crazy that you need to setup the gateway endpoint for S3/DDB separately. And even crazier that you have to pay for private links per AWS service endpoint.
There's very real differences between NAT gateways and VPC Gateway Endpoints.
NAT gateways are not purely hands-off, you can attach additional IP addresses to NAT gateways to help them scale to supporting more instances behind the NAT gateway, which is a fundamental part of how NAT gateways work in network architectures, because of the limit on the number of ports that can be opened through a single IP address. When you use a VPC Gateway Endpoint then it doesn't use up ports or IP addresses attached to a NAT gateway at all. And what about metering? If you pay per GB for traffic passing through the NAT gateway, but I guess not for traffic to an implicit built-in S3 gateway, so do you expect AWS to show you different meters for billed and not-billed traffic, but performance still depends on the sum total of the traffic (S3 and Internet egress) passing through it? How is that not confusing?
It's also besides the point that not all NAT gateways are used for Internet egress, indeed there are many enterprise networks where there are nested layers of private networks where NAT gateways help deal with overlapping private IP CIDR ranges. In such cases, having some kind of implicit built-in S3 gateway violates assumptions about how network traffic is controlled and routed, since the assumption is for the traffic to be completely private. So even if it was supported, it would need to be disabled by default (for secure defaults), and you're right back at the equivalent situation you have today, where the VPC Gateway Endpoint is a separate resource to be configured.
Not to mention that VPC Gateway Endpoints allow you to define policy on the gateway describing what may pass through, e.g. permitting read-only traffic through the endpoint but not writes. Not sure how you expect that to work with NAT gateways. This is something that AWS and Azure have very similar implementatoons for that work really well, whereas GCP only permits configuring such controls at the Organization level (!)
They are just completely different networking tools for completely different purposes. I expect closed-by-default secure defaults. I expect AWS to expose the power of different networking implements to me because these are low-level building blocks. Because they are low-level building blocks, I expect for there to be footguns and for the user to be held responsible for correct configuration.
Again, you are dealing with low-level primitives. You can provision an EC2 VM with multiple GPUs at high cost and use it to host nginx. That is not a correct configuration. There are much cheaper ways available to you. It's ridiculous to imply that AWS shouldn't send you a higher bill because you didn't use the GPUs or that AWS shouldn't offer instances with GPUs because they are more expensive. You, the user, are responsible for building a correct configuration with the low-level primitives that have been made available to you! If it's too much then feel free to move up the stack and host your workloads on a PaaS instead.
It being low level is not an excuse for systems that lead people down the wrong path.
And the traffic never even reaches the public internet. There's a mismatch between what the billing is supposedly for and what it's actually applied to.
> do you expect AWS to show you different meters for billed and not-billed traffic, but performance still depends on the sum total of the traffic (S3 and Internet egress) passing through it?
Yes.
> How is that not confusing?
That's how network ports work. They only go so fast, and you can be charged based on destination. I don't see the issue.
> It's also besides the point that not all NAT gateways are used for Internet egress
Okay, if two NAT gateways talk to each other it also should not have egress fees.
> some kind of implicit built-in S3 gateway violates assumptions
So don't do that. Checking if the traffic will leave the datacenter doesn't need such a thing.
I really wish ESM was easier to adopt. But we're halfway through 2025 and there are still compatibility issues with it. And it just gets even worse now that so many packages are going ESM only. You get stuck having to choose what to cut out. I write my code in TS using ESM syntax, but still compile down to CJS as the build target for my sanity.
In many ways, this debacle is reminiscent of the Python 2 to 3 cutover. I wish we had started with bidirectional import interop and dual module publications with graceful transitions instead of this cold turkey "new versions will only publish ESM" approach.
After covid, it was never the same. Open for shorter windows, closed on Sundays, reduced selection, no more meal kits etc.
I had many friends who worked on Amazon Go, so it's a bit sad to see that work come to an end.
reply