At TypeScript-level, I think simply disallowing them makes much more sense. You can already replace .push with .concat, .sort with .toSorted, etc. to get the non-mutating behavior so why complicate things.
I don't have much experience in dedicated vector databases, I've only used pgvector, so pardon me if there's an obvious answer to this, but how do people do similarity search combined with other filters and pagination with separate vector DB? It's a pretty common use case at least in my circles.
For example, give me product listings that match the search term (by vector search), and are made by company X (copanies being a separate table). Sort by vector similarity of the search term and give me top 100?.
We have even largely moved away from ElasticSearch to Postgres where we can, because it's just so much easier to implement with new complex filters without needing to add those other tables' data to the index of e.g. "products" every time.
Edit: Ah I guess this is touched a bit in the article with "Pre- vs. Post-Filtering" - I guess you just do the same as with ElasticSearch, predict what you'll want to filter with, add all of that to metadata and keep it up to date.
Ehh what. I would give some merit to arguments like "no one should use lodash in 2025 because you can do most of it with built-ins nowadays" or maybe because it doesn't tree-shake well or maybe even because it doesn't seem to have much active development now.
But stating matter-of-factly that no one should use it because some of its well-documented functions are mutating ones and not functional-style, and should instead use one particular FP library out of the many out there, is not very cool.
> As of TypeScript 5.0, the project's output target was switched from es5 to es2018 as part of a transition to ECMAScript modules. This meant that TypeScript could rely on the emit for native (and often more-succinct) syntax supported between ES2015 and ES2018. One might expect that this would unconditionally make things faster, but surprise we encountered was a slowdown from using let and const natively!
So they don't transpile to ES5, and that is the issue.
I don't think pinning deps will help you much, as these incidents often affect transitive dependencies not listed in package.json. package-lock.json is there to protect against automatic upgrades.
I know there are some reports about the lockfile not always working as expected. Some of those reports are outdated info from like 2018 that is simply not true anymore, some of that is due to edge cases like somebody on team having outdated version of npm or installing a package but not committing the changes to lockfile right away. Whatever the reason, pinned version ranges wouldn't protect against that. Using npm ci instead of npm install would.
No, it doesn't solve it - but it might minimise the blast radius - there are so many unmaintained libraries of code that indeed one compromised minor patch on any dependency can become a risk.
That's sort of the thing - all of these measures are just patches on the fundamental problem that npm has just become too unsafe
The main issue there is that the maintainer lost access to their account. Yanking malicious packages is better, but even just being able to release new patch versions would've stopped the spread, but they were not able to do so for the packages that didn't have a co-publisher. How would crates.io help in this situation?
FWIW npm used to allow unpublishing packages, but AFAIK that feature was removed in the wake of the left-pad incident [1]. Altho now with all the frequent attacks, it might be worth considering if ecosystem disruption via malicious removal of pacakge would be lesser of two evils, compared to actual malware being distributed.
It's easy to do both at project level and globally, and these days there are quite few legit packages that don't work without them. For those that don't, you can create a separate installation script to your project that cds into that folder and runs their install-script.
I know this isn't a silver bullet solution to supply chain attakcs, but, so far it has been effective against many attacks through npm.
I also use bubblewrap to isolate npm/pnpm/yarn (and everything started by them) from the rest of the system. Let's say all your source code resides in ~/code; put this somewhere in the beginning of your $PATH and name it `npm`; create symlinks/hardlinks to it for other package managers:
Notably `--share-net` should be moved down since it is negated by `--unshare-all`. I also added a reminder that the command is being bubblewrapped, modified the second read-write bind to the current directory, and changed the final exec to use `/usr/bin/env` to find the binary so it can be more flexible. I tested it with npm and yarn just now and it seems to work well. Thanks!
Not sure what this means. bubblewrap is as free as it gets, it's just a thin wrapper around the same kernel mechanisms used for containers, except that it uses your existing filesystems instead of creating a separate "chroot" from an OCI image (or something like it).
The only thing it does is hiding most of your system from the stuff that runs under it, whitelisting specific paths, and optionally making them readonly. It can be used to run npx, or anything else really — just shove move symblinks into the beginning of your $PATH, each referencing the script above. Run any of them and it's automatically restricted from accessing e.g. your ~/.ssh
This is such a defeatist perspective. You could say this about anything ad nauseum. I think bubblewrap (or firejail) is less likely to be a successful target.
While this may be true, this is still a major improvement, no?
i.e. it seems far more likely that a rapidly evolving hot new project will be targeted vs. something more stable and explicitly security focused like bubblewrap.
Nothing. Does your threat model assume 100% trust in your distro? I understand saying you trust it a lot more than the garbage on npm. But if your trust is anything less than 100%, you are balancing risk and benefit.
No, "instead". If they compromise bubblewrap to send out your files, and you run bubblewrap anyway for any reason, you're still compromised.
But obviously you can probably safely pin bubblewrap to a given version, and you don't need to "install packages through it", which is the main weakness of package managers
Bubblewrap uses the same Linux functions that billion dollar cloud infrastructure use. Bubblewrap does no sandboxing/restrictions itself, it's instructing the kernel to do it.
How? bubblewrap isn't something someone has randomly uploaded to npm, it has well known maintainers and a well organised release process (including package signing). Which is easier to do: upload a package to npm and get people to use it, or spend 2+ years trying to become a maintainer of bubblewrap or one of its dependencies to compromise it.
The fact that something can happen is separate from how likely that thing is to happen, and that’s what matters here.
The comments here that point to this theoretical possibility seem to be missing the point, which is that using something like bubblewrap is an improvement over running arbitrary projects un-sandboxed, and the likelihood of such an attack is far less than the likelihood of any one of hundreds of rapidly evolving, lesser known, lesser scrutinized projects getting compromised.
It also has catalogs feature for defining versions or version ranges as reusable constants that you can reference in workspace packages. It was almost the only reason (besides speed) I switched a year ago from npm and never looked back.
‘pnpm’ is great, swapped to it a year ago after yarn 1->4 looked like a new project every version and npm had an insane dependency resolution issue for platform specific packages
pnpm had good docs and was easy to put in place. Recommend
A few years ago it didn't work in all cases when npm did. It made me stop using it because I didn't want to constantly check with two tools. The speed boost is nice but I don't need to npm install that often.
Why the same advice doesn't apply to `setup.py` or `build.rs`? Is it because npm is (ab)used for software distribution (eg. see sibling comment: https://news.ycombinator.com/item?id=45041292) instead of being used only for managing library-dependencies?
It should apply for anything. Truth be told the process of learning programming is so arduous at times that you basically just copy and paste and run fucking anything in terminal to get a project setup or fixed.
Go down the rabbit hole of just installing LLM software and you’ll find yourself in quite a copy and paste frenzy.
We got used to this GitHub shit of setting up every process of an install script in this way, so I’m surprised it’s not happening constantly.
It should, and also to Makefile.PL, etc. These systems were created at a time when you were dealing with a handful of dependencies, and software development was a friendlier place.
Now you're dealing with hundreds of recursive dependencies, all of which you should assume may become hostile at any time. If you neither audit your dependencies, nor have the ability to sue them for damages, you're in a precarious position.
Yeah I guess it probably helps you specifically, because most malware is going to do the lazy thing and use install scripts. But it doesn't help everyone in general because if e.g. NPM disabled those scripts entirely (or made them opt-in) then the malware authors would just put their malware into the `npm run` as you say.
Indeed it may save you in case the malware is being particularly lazy but I think it may do more harm than good by giving people a false sense of security and it can also break packages that use post-install scrips for legitimate reasons.
For anyone who actually cares about supply chain attacks, the minimum you should be doing is running untrusted code in some sort of a sandbox that doesn't have access to important credentials like SSH keys, like a dev container of some sort.
You would still need to audit the code otherwise you might ship a backdoor to production but it would at least protect you against a developer machine compromise... unless you get particularly unlucky and it also leverages a container escape 0-day, but that's secure enough for me personally.
I guess this won't help with something like nx. It's a CLI tool that is supposed to be executed inside the source code repo, in CI jobs or on developer pcs.
According to the description in advisory, this attack was in a postinstall script. So it would've helped in this case with nx. Even if you ran the tool, this particular attack wouldn't have been triggered if you had install scripts ignored.
As a linux admin, I refuse to install npm or anything that requires it as a dep. It's been bad since the start. At least some people are starting to see it.
Can't you guys replace the most vulnerable parts with something better? I have been experimenting with Go + Fyne, it is pretty neat, all things considered.
My guess would be because they affect property ordering, complicating the stringification.
The default object property iteration rules in JS define that numeric properties are traversed first in their numeric order, and only then others in the order they were added to the object.
Since the numbers need to be in their numeric, not lexical, order, the engine would also need to parse them to ints before sorting.
SeaQuery looks like a similar dynamic query builder for Rust as Kysely is for JS/TS, so yeah, that'd probably solve the dynamic query problem. But I think parent wasn't so much asking for another library but for patterns.
How do people who choose to use a no-dsl SQL library, like SQLx, handle dynamic queries? Especially with compile-time checking. The readme has this example:
...
WHERE organization = ?
But what if you have multiple possible where-conditions, let's say
"WHERE organization = ?", "WHERE starts_with(first_name, ?)", "WHERE birth_date > ?",
and you need to some combination of those (possibly also none of those) based on query parameters to the API. I think that's a pretty common use case.
I agree with you that dynamic query building can be tedious with a pure SQL approach.
The use case you are describing can be solved with something alone the lines of:
WHERE organization = $1
AND ($2 IS NULL OR starts_with(first_name, $2)
AND ($3 IS NULL OR birth_date > $3)
With SQLx you would have all the params to be Options and fill them according the parameters that were sent to your API.
I think the dynamic part is where the clauses themselves are optional. For example, say you have a data table that a user can filter rows using multiple columns. They can filter by just `first_name` or by `birth_date` or both at the same time using AND / OR, and so on. So you’re dynamically needing to add more or less “WHERE” clauses and then it gets tricky when you have to include placeholders like `$1` since you have to keep track of how many parameters your dynamic query is actually including.
That's relying a lot on the DB engine, which will struggle as the condition gets more complex. I've had MySQL make stupid choices of query plans for very similar queries, I had to break the OR into UNIONs
I generally avoid DSLs as they don't bring much... except for this exact use-case. Dynamic queries is pretty much what a query builder is for: you can avoid a dependency by rolling your own, but well it's not trivial and people out there have built some decent ones.
So, if I have this use-case I'd reach for a query builder library. To answer the question of "how to do dynamic queries without a query builder library", I don't think there's any other answer than "make your own query builder"
in general sqlx only provides the most minimal string based query building so you can easily run into annoying edge cases you forgot to test, so if your project needs that, libraries like sea-query or sea-orm are the way to go (through it's still viable, without just a bit annoying).
in general SQLx "compile time query checking" still needs a concrete query and a running db to check if the query is valid. It is not doing a rem-implementation of every dialects syntax, semantics and subtle edge cases etc. that just isn't practical as sql is too inconsistent in the edge cases, non standard extensions and even the theoretical standardized parts due to it costing money to read the standard and its updates being highly biased for MS/Oracle databases).
This means compile time query checking doesn't scale that well to dynamic queries, you basically would need to build and check every query you might dynamically create (or the subset you want to test) at which point you are in integration test territory (and you can do it with integration tests just fine).
besides the sqlx specific stuff AFIK some of the "tweaked sql syntax for better composeability" experiments are heading for SQL standardization which might make this way less of a pain in the long run but I don't remember the details at all, so uh, maybe not???
---
EDIT: Yes there is an sqlx "offline" mode which doesn't need a live db, it works by basically caching results from the online mode. It is very useful, but still no "independent/standalone" query analysis.