Hacker Newsnew | past | comments | ask | show | jobs | submit | inbx0's commentslogin

At TypeScript-level, I think simply disallowing them makes much more sense. You can already replace .push with .concat, .sort with .toSorted, etc. to get the non-mutating behavior so why complicate things.


I don't have much experience in dedicated vector databases, I've only used pgvector, so pardon me if there's an obvious answer to this, but how do people do similarity search combined with other filters and pagination with separate vector DB? It's a pretty common use case at least in my circles.

For example, give me product listings that match the search term (by vector search), and are made by company X (copanies being a separate table). Sort by vector similarity of the search term and give me top 100?.

We have even largely moved away from ElasticSearch to Postgres where we can, because it's just so much easier to implement with new complex filters without needing to add those other tables' data to the index of e.g. "products" every time.

Edit: Ah I guess this is touched a bit in the article with "Pre- vs. Post-Filtering" - I guess you just do the same as with ElasticSearch, predict what you'll want to filter with, add all of that to metadata and keep it up to date.


Ehh what. I would give some merit to arguments like "no one should use lodash in 2025 because you can do most of it with built-ins nowadays" or maybe because it doesn't tree-shake well or maybe even because it doesn't seem to have much active development now.

But stating matter-of-factly that no one should use it because some of its well-documented functions are mutating ones and not functional-style, and should instead use one particular FP library out of the many out there, is not very cool.


The post links to a TS issue [1] that explains

> As of TypeScript 5.0, the project's output target was switched from es5 to es2018 as part of a transition to ECMAScript modules. This meant that TypeScript could rely on the emit for native (and often more-succinct) syntax supported between ES2015 and ES2018. One might expect that this would unconditionally make things faster, but surprise we encountered was a slowdown from using let and const natively!

So they don't transpile to ES5, and that is the issue.

1: https://github.com/microsoft/TypeScript/issues/52924


I don't think pinning deps will help you much, as these incidents often affect transitive dependencies not listed in package.json. package-lock.json is there to protect against automatic upgrades.

I know there are some reports about the lockfile not always working as expected. Some of those reports are outdated info from like 2018 that is simply not true anymore, some of that is due to edge cases like somebody on team having outdated version of npm or installing a package but not committing the changes to lockfile right away. Whatever the reason, pinned version ranges wouldn't protect against that. Using npm ci instead of npm install would.


No, it doesn't solve it - but it might minimise the blast radius - there are so many unmaintained libraries of code that indeed one compromised minor patch on any dependency can become a risk.

That's sort of the thing - all of these measures are just patches on the fundamental problem that npm has just become too unsafe


The main issue there is that the maintainer lost access to their account. Yanking malicious packages is better, but even just being able to release new patch versions would've stopped the spread, but they were not able to do so for the packages that didn't have a co-publisher. How would crates.io help in this situation?

FWIW npm used to allow unpublishing packages, but AFAIK that feature was removed in the wake of the left-pad incident [1]. Altho now with all the frequent attacks, it might be worth considering if ecosystem disruption via malicious removal of pacakge would be lesser of two evils, compared to actual malware being distributed.

1: https://en.wikipedia.org/wiki/Npm_left-pad_incident


fzf [1] provides the TUI.

1: https://github.com/junegunn/fzf


Ah of course. I even use that. Just didn't look closely enough.


Periodic reminder to disable npm install scripts.

    npm config set ignore-scripts true [--global]
It's easy to do both at project level and globally, and these days there are quite few legit packages that don't work without them. For those that don't, you can create a separate installation script to your project that cds into that folder and runs their install-script.

I know this isn't a silver bullet solution to supply chain attakcs, but, so far it has been effective against many attacks through npm.

https://docs.npmjs.com/cli/v8/commands/npm-config


I also use bubblewrap to isolate npm/pnpm/yarn (and everything started by them) from the rest of the system. Let's say all your source code resides in ~/code; put this somewhere in the beginning of your $PATH and name it `npm`; create symlinks/hardlinks to it for other package managers:

  #!/usr/bin/bash

  bin=$(basename "$0")

  exec bwrap \
    --bind ~/.cache/nodejs ~/.cache \
    --bind ~/code ~/code \
    --dev /dev \
    --die-with-parent \
    --disable-userns \
    --new-session \
    --proc /proc \
    --ro-bind /etc/ca-certificates /etc/ca-certificates \
    --ro-bind /etc/resolv.conf /etc/resolv.conf \
    --ro-bind /etc/ssl /etc/ssl \
    --ro-bind /usr /usr \
    --setenv PATH /usr/bin \
    --share-net \
    --symlink /tmp /var/tmp \
    --symlink /usr/bin /bin \
    --symlink /usr/bin /sbin \
    --symlink /usr/lib /lib \
    --symlink /usr/lib /lib64 \
    --tmpfs /tmp \
    --unshare-all \
    --unshare-user \
    "/usr/bin/$bin" "$@"
The package manager started through this script won't have access to anything but ~/code + read-only access to system libraries:

  bash-5.3$ ls -a ~
  .  ..  .cache  code
bubblewrap is quite well tested and reliable, it's used by Steam and (IIRC) flatpak.


Thanks, handy wrapper :) Note:

    --symlink /usr/lib /lib64 \
should probably be `/usr/lib64`

and

    --share-net \
should go after the `--unshare-all --unshare-user`

Also, my system doesn't have a symlink from /tmp to /var/tmp, so I'm guessing that's not needed for me (while /bin etc. are symlinks)


Very cool idea. Thanks for sharing. I made some minor tweaks based on feedback to your comment:

  #!/usr/bin/env bash
  #
  # See: https://news.ycombinator.com/item?id=45034496
  
  bin=$(basename "$0")
  
  echo "==========================="
  echo "Wrapping $bin in bubblewrap"
  echo "==========================="
  
  exec bwrap \
    --bind ~/.cache ~/.cache \
    --bind "${PWD}" "${PWD}" \
    --dev /dev \
    --die-with-parent \
    --disable-userns \
    --new-session \
    --proc /proc \
    --ro-bind /etc/ca-certificates /etc/ca-certificates \
    --ro-bind /etc/resolv.conf /etc/resolv.conf \
    --ro-bind /etc/ssl /etc/ssl \
    --ro-bind /usr /usr \
    --setenv PATH /usr/bin \
    --symlink /usr/bin /bin \
    --symlink /usr/bin /sbin \
    --symlink /usr/lib /lib \
    --symlink /usr/lib64 /lib64 \
    --tmpfs /tmp \
    --unshare-all \
    --unshare-user \
    --share-net \
    /usr/bin/env "$bin" "$@"

Notably `--share-net` should be moved down since it is negated by `--unshare-all`. I also added a reminder that the command is being bubblewrapped, modified the second read-write bind to the current directory, and changed the final exec to use `/usr/bin/env` to find the binary so it can be more flexible. I tested it with npm and yarn just now and it seems to work well. Thanks!


Very cool. Hadn't heard of this before. I appreciate you posting it.


Will this work on osX? and for pnpm?


No, bubblewrap uses linux namespaces. You can use for (almost) whatever software you want.


Firejail is quite good, too. I have been using firejail more than bubblewrap.


This is trading one distribution problem (npx) for another (bubblewrap). I think it’s a reasonable trade, but there’s no free lunch.


Not sure what this means. bubblewrap is as free as it gets, it's just a thin wrapper around the same kernel mechanisms used for containers, except that it uses your existing filesystems instead of creating a separate "chroot" from an OCI image (or something like it).

The only thing it does is hiding most of your system from the stuff that runs under it, whitelisting specific paths, and optionally making them readonly. It can be used to run npx, or anything else really — just shove move symblinks into the beginning of your $PATH, each referencing the script above. Run any of them and it's automatically restricted from accessing e.g. your ~/.ssh

https://wiki.archlinux.org/title/Bubblewrap


It means that someone just has to compromise bubblewrap instead of the other vectors.


This is such a defeatist perspective. You could say this about anything ad nauseum. I think bubblewrap (or firejail) is less likely to be a successful target.


While this may be true, this is still a major improvement, no?

i.e. it seems far more likely that a rapidly evolving hot new project will be targeted vs. something more stable and explicitly security focused like bubblewrap.


Am I getting bubblewrap somewhere other than my distro? What makes it different from any other executable that comes from there?


Nothing. Does your threat model assume 100% trust in your distro? I understand saying you trust it a lot more than the garbage on npm. But if your trust is anything less than 100%, you are balancing risk and benefit.


Not "instead", it's "in addition to". Your classical defense-in-depth.


No, "instead". If they compromise bubblewrap to send out your files, and you run bubblewrap anyway for any reason, you're still compromised.

But obviously you can probably safely pin bubblewrap to a given version, and you don't need to "install packages through it", which is the main weakness of package managers


Bubblewrap uses the same Linux functions that billion dollar cloud infrastructure use. Bubblewrap does no sandboxing/restrictions itself, it's instructing the kernel to do it.


How? bubblewrap isn't something someone has randomly uploaded to npm, it has well known maintainers and a well organised release process (including package signing). Which is easier to do: upload a package to npm and get people to use it, or spend 2+ years trying to become a maintainer of bubblewrap or one of its dependencies to compromise it.


Sure, but there's plenty of packages with well-known maintainers who get compromised...


The fact that something can happen is separate from how likely that thing is to happen, and that’s what matters here.

The comments here that point to this theoretical possibility seem to be missing the point, which is that using something like bubblewrap is an improvement over running arbitrary projects un-sandboxed, and the likelihood of such an attack is far less than the likelihood of any one of hundreds of rapidly evolving, lesser known, lesser scrutinized projects getting compromised.


sure but surely one gets bubblewrap from their distro, and you have to trust your distro anyway.


Or use pnpm. The latest versions have all dependency lifecycle scripts ignored by default. You must whitelist each package.


pnpm is not only more secure, it's also faster, more efficient wrt disk usage, and more deterministic by design.


It also has catalogs feature for defining versions or version ranges as reusable constants that you can reference in workspace packages. It was almost the only reason (besides speed) I switched a year ago from npm and never looked back.


workspace protocol in monorepo is also great, we're using it a lot.


OK so it seems too good now, what are the downsides?


If you relied on hoisting of transitive dependencies, you'll now have to declare that fact in a project's .npmrc

Small price to pay for all the advantages already listed.


They’re moving all that to the pnpm-workspace.yaml file now


‘pnpm’ is great, swapped to it a year ago after yarn 1->4 looked like a new project every version and npm had an insane dependency resolution issue for platform specific packages

pnpm had good docs and was easy to put in place. Recommend


A few years ago it didn't work in all cases when npm did. It made me stop using it because I didn't want to constantly check with two tools. The speed boost is nice but I don't need to npm install that often.


Downside is that you have to add "p" in front, ie. instead of "npm" you have to type "pnpm". That's all that I'm aware of.


Personally, I didn't find a way to create one docker image for each of my project (in a pnpm monorepo) in an efficient way


That’s not really a pnpm problem on the face of it


Same for bun, which I find faster than pnpm


Bun still executes the scripts of a certain hardcoded list of 500 packages, it's not exactly the same.


This is the way. It’s a pain to manually disable the checks, but certainly better than becoming victim to an attack like this.


I run all npm based tools inside Docker with no access beyond the current directory.

https://ashishb.net/programming/run-tools-inside-docker/

It does reduce the attach surface drastically.


Why the same advice doesn't apply to `setup.py` or `build.rs`? Is it because npm is (ab)used for software distribution (eg. see sibling comment: https://news.ycombinator.com/item?id=45041292) instead of being used only for managing library-dependencies?


It should apply for anything. Truth be told the process of learning programming is so arduous at times that you basically just copy and paste and run fucking anything in terminal to get a project setup or fixed.

Go down the rabbit hole of just installing LLM software and you’ll find yourself in quite a copy and paste frenzy.

We got used to this GitHub shit of setting up every process of an install script in this way, so I’m surprised it’s not happening constantly.


It should, and also to Makefile.PL, etc. These systems were created at a time when you were dealing with a handful of dependencies, and software development was a friendlier place.

Now you're dealing with hundreds of recursive dependencies, all of which you should assume may become hostile at any time. If you neither audit your dependencies, nor have the ability to sue them for damages, you're in a precarious position.


For simple python libraries setup.py has been discouraged for a long time in favour of pyproject.toml for exactly this reason


Whenever I read this well-meaning advice I have to ask: Do you actually read hundreds of thousands of lines of code (or more) that NPM installed?

Because the workflow for 99.99% of developers is something resembling:

1. git clone

2. npm install (which pulls in a malicious dependency but disabling post-install scripts saved you for now!)

3. npm run (executing your malicious dependency, you're now infected)

The only way this advice helps you is if you also insert "audit the entirety of node_modules" in between steps 2 and 3 which nobody does.


Yeah I guess it probably helps you specifically, because most malware is going to do the lazy thing and use install scripts. But it doesn't help everyone in general because if e.g. NPM disabled those scripts entirely (or made them opt-in) then the malware authors would just put their malware into the `npm run` as you say.


Indeed it may save you in case the malware is being particularly lazy but I think it may do more harm than good by giving people a false sense of security and it can also break packages that use post-install scrips for legitimate reasons.

For anyone who actually cares about supply chain attacks, the minimum you should be doing is running untrusted code in some sort of a sandbox that doesn't have access to important credentials like SSH keys, like a dev container of some sort.

You would still need to audit the code otherwise you might ship a backdoor to production but it would at least protect you against a developer machine compromise... unless you get particularly unlucky and it also leverages a container escape 0-day, but that's secure enough for me personally.


This sucks for libraries that download native binaries in their install script. There are quite a few.


Downloading binaries as part of an installation of a scripting language library should always be assumed to be malicious.

Everything must be provided as source code and any compilation must happen locally.


Sure, but then you need to have a way to whitelist


The whitelist is the package-lock.json of the hashes of libraries you or a security reviewer you trust has reviewed.


You can still whitelist them, though, and reinstall them.


I guess this won't help with something like nx. It's a CLI tool that is supposed to be executed inside the source code repo, in CI jobs or on developer pcs.


According to the description in advisory, this attack was in a postinstall script. So it would've helped in this case with nx. Even if you ran the tool, this particular attack wouldn't have been triggered if you had install scripts ignored.


As a linux admin, I refuse to install npm or anything that requires it as a dep. It's been bad since the start. At least some people are starting to see it.


> As a linux admin, I refuse to install npm or anything that requires it as a dep. It's been bad since the start.

As a front-end web developer, I need a node package manager; and npm comes bundled with node.


How in the world did we survive before node?


Looks like pnpm 10 does not run lifecycle scripts of dependencies unless they are listed in ‘onlyBuiltDependencies’.

Source: https://pnpm.io/settings#ignoredepscripts


At this point why not just avoid npm (and friends) like the plague? Genuinely curious.


I work for a company that needs to ship software so my salary can get paid


Can't you guys replace the most vulnerable parts with something better? I have been experimenting with Go + Fyne, it is pretty neat, all things considered.


Pnpm natively lets you selectively enable it on a package basis


I wonder how many other packages are going to be compromised due to this also. Like a network effect.


Secondary reminder that it means nothing as soon as you run any of scripts or binaries


Unfortunately this also blocks your own life cycle scripts.


Does it work the same for pnpm ?


My guess would be because they affect property ordering, complicating the stringification.

The default object property iteration rules in JS define that numeric properties are traversed first in their numeric order, and only then others in the order they were added to the object. Since the numbers need to be in their numeric, not lexical, order, the engine would also need to parse them to ints before sorting.

    > JSON.stringify({b: null, 10: null, 1: null, a: null})
    '{"1":null,"10":null,"b":null,"a":null}'


SeaQuery looks like a similar dynamic query builder for Rust as Kysely is for JS/TS, so yeah, that'd probably solve the dynamic query problem. But I think parent wasn't so much asking for another library but for patterns.

How do people who choose to use a no-dsl SQL library, like SQLx, handle dynamic queries? Especially with compile-time checking. The readme has this example:

  ...
  WHERE organization = ?
But what if you have multiple possible where-conditions, let's say "WHERE organization = ?", "WHERE starts_with(first_name, ?)", "WHERE birth_date > ?", and you need to some combination of those (possibly also none of those) based on query parameters to the API. I think that's a pretty common use case.


I agree with you that dynamic query building can be tedious with a pure SQL approach. The use case you are describing can be solved with something alone the lines of:

  WHERE organization = $1
     AND ($2 IS NULL OR starts_with(first_name, $2)
     AND ($3 IS NULL OR birth_date > $3)
With SQLx you would have all the params to be Options and fill them according the parameters that were sent to your API.

Does that make sense?


I think the dynamic part is where the clauses themselves are optional. For example, say you have a data table that a user can filter rows using multiple columns. They can filter by just `first_name` or by `birth_date` or both at the same time using AND / OR, and so on. So you’re dynamically needing to add more or less “WHERE” clauses and then it gets tricky when you have to include placeholders like `$1` since you have to keep track of how many parameters your dynamic query is actually including.


That's relying a lot on the DB engine, which will struggle as the condition gets more complex. I've had MySQL make stupid choices of query plans for very similar queries, I had to break the OR into UNIONs


I generally avoid DSLs as they don't bring much... except for this exact use-case. Dynamic queries is pretty much what a query builder is for: you can avoid a dependency by rolling your own, but well it's not trivial and people out there have built some decent ones.

So, if I have this use-case I'd reach for a query builder library. To answer the question of "how to do dynamic queries without a query builder library", I don't think there's any other answer than "make your own query builder"


> Especially with compile-time checking.

no compile time checking and integration tests

in general sqlx only provides the most minimal string based query building so you can easily run into annoying edge cases you forgot to test, so if your project needs that, libraries like sea-query or sea-orm are the way to go (through it's still viable, without just a bit annoying).

in general SQLx "compile time query checking" still needs a concrete query and a running db to check if the query is valid. It is not doing a rem-implementation of every dialects syntax, semantics and subtle edge cases etc. that just isn't practical as sql is too inconsistent in the edge cases, non standard extensions and even the theoretical standardized parts due to it costing money to read the standard and its updates being highly biased for MS/Oracle databases).

This means compile time query checking doesn't scale that well to dynamic queries, you basically would need to build and check every query you might dynamically create (or the subset you want to test) at which point you are in integration test territory (and you can do it with integration tests just fine).

besides the sqlx specific stuff AFIK some of the "tweaked sql syntax for better composeability" experiments are heading for SQL standardization which might make this way less of a pain in the long run but I don't remember the details at all, so uh, maybe not???

---

EDIT: Yes there is an sqlx "offline" mode which doesn't need a live db, it works by basically caching results from the online mode. It is very useful, but still no "independent/standalone" query analysis.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: