Why can’t pypy / npm / etc just scan all newly uploaded modules for typical malw...

simonw · 2025-04-12T16:06:33 1744473993

Because doing so is computationally expensive and would be making false promises.

False positives where it incorrectly flagged a safe package would result in the need for a human review step, which is even more expensive.

False negatives where malware patterns didn't match anything previously would happen all the time, so if people learned to "trust" the scanning they would get caught out - at which point what value is the scanning adding?

I don't know if there are legal liability issues here too, but that would be worth digging into.

As it stands, there are already third parties that are running scans against packages uploaded to npm and PyPI and helping flag malware. Leaving this to third parties feels like a better option to me, personally.

VladVladikoff · 2025-04-15T02:28:11 1744684091

>Leaving this to third parties feels like a better option to me, personally.

Seems too late to me. At this point the module/package was already added into the ecosystem, it could potentially be some time (months?) before it is flagged by third party and removed.

12_throw_away · 2025-04-12T18:28:53 1744482533

> Why can’t [X] just [Y] first?

The word "just" here always presumes magic that does not actually exist.

jruohonen · 2025-04-12T19:41:20 1744486880

> The word "just" here always presumes magic that does not actually exist.

The magic here is, yes, AI. If you look at the mobile app stores, they've all become much better, although false positives occur, of course.

simonw · 2025-04-12T20:06:56 1744488416

Those App Stores also spend hundreds of millions of dollars a year on human staff. PyPI doesn't get to do that!

jruohonen · 2025-04-12T20:18:56 1744489136

Sure, but I'd guess PyPI could cut off much of the really bad stuff, such as malware, by AI (as everything is know called). Having a waiting list for false positives would not hurt anyone much. Yet, a foreseeable alternative is that PyPI and friends continue to be dumpyards, but communities will build up whitelists.

simonw · 2025-04-12T20:43:54 1744490634

See my comment here for why I don't think that would work: https://news.ycombinator.com/item?id=43665581

There are a small number of PyPI things they require human support queues at the moment and they are sometimes overwhelmed already.