Hacker Newsnew | past | comments | ask | show | jobs | submit | jonahx's commentslogin

This is the standard (for good reason) in code golf.

thanks for introducing me to code golf.-)

Nice niche golfing languages as well for your continued entertainment!

See also: https://code.golf/ for a relatively recent gamified site with rankings

this is so good

So the stuff that agents would excel at is essentially just the "checklist" part of the job? Check A, B, C, possibly using tools X, Y, Z, possibly multi-step checks but everything still well-defined.

Whereas finding novel exploits would still be the domain of human experts?


I'm bullish on novel exploits too but I'm much less confident in the prediction. I don't think you can do two network pentests and not immediately reach the conclusion that the need for humans to do significant chunks of that work at all is essentially a failure of automation.

With more specificity: I would not be at all surprised if the "industry standard" netpen was 90%+ agent-mediated by the end of this year. But I also think that within the next 2-3 years, that will be true of web application testing as well, which is in a sense a limited (but important and widespread) instance of "novel vulnerability" discovery.


Well, agents can't discover bypass attacks because they don't have memory. That was what DNCs [1] (Differentiable Neural Computers) tried to accomplish. Correlating scan metrics with analytics is btw a great task for DNCs and what they are good at due to how their (not so precise) memory works. Not so much though at understanding branch logic and their consequences.

However, I currently believe that forensic investigations will change post LLMs, because they're very good at translating arbitrary bytecode, assembly, netasm, intel asm etc syntax to example code (in any language). It doesn't have to be 100% correct in those translations, that's why LLMs can be really helpful for the discovery phase after an incident. Check out the ghidra MCP server which is insane to see real-time [2]

[1] https://github.com/JoergFranke/ADNC

[2] https://github.com/LaurieWired/GhidraMCP


The lack of memory issue is already being solved architecturally, and ARTEMIS is a prime example. Instead of relying on the model's context window (which is "leaky"), they use structured state passed between iterations. It's not a DNC per se, but it is a functional equivalent of long-term memory. The agent remembers it tried an SQL injection an hour ago not because it's in the context, but because it's logged in its knowledge base. This allows for chaining exploits, which used to be the exclusive domain of humans

Can you be more specific about the kind of "bypass attack" you think an agent can't find? Like, provide a schematic example?

SSL Heartbleed is a good example. Or pretty much any vulnerability that needs understanding of how memset or malloc works, or anything where you have to use leaky functions to create a specific offset because that's where the return (eip) in assembly is, so that you can modify/exploit that jmp or cmp call.

These kind of things are very hard for LLMs because they tend to forget way too much important information about both the code (in the branching sense) and the program (in the memory sense).

I can't provide a schematic for this, but it's pretty common in binary exploitation CTF events, and kind of mandatory knowledge about exploit development.

I listed some nice CTFs we did with our group in case you wanna know more about these things [1]. I think in regards to LLMs and this bypass/sidechannel attacks topic I'd refer to the Fusion CTF [2] specifically, because it covers a lot of examples.

[1] https://cookie.engineer/about/writeups.html

[2] https://exploit.education/fusion/


Wait, I don't understand why Heartbleed is at all hard for an agent loop to uncover. There's a pattern for these attacks (we found one in nginx in the ordinary course of a web app pentest at Matasano --- and we didn't find it based on code, though I don't concede that an LLM would have a hard time uncovering these kinds of issues in code either).

I think people are coming to this with the idea that a pentesting agent is pulling all its knowledge of vulnerabilities and testing patterns out of its model weights. No. The whole idea of a pentesting agent is that the agent code --- human-mediated code that governs the LLM --- encodes a large amount of knowledge about how attacks work.


I think I'd differ between source code audits (where LLMs already are pretty good at spotting bugs if you can convince them to) and exploit development here.

The former is automated by a large part already with fuzz testing of all kinds, so you wouldn't need an LLM if you knew what you were doing and have a TDD workflow or similar that checks against memleaks (say, with valgrind or similar approaches).

The latter part is what I was referring to where I had hope initially that DNCs could help with that, and what I'd say that right now LLMs cannot discover this, only repeat and translate it (e.g. similar vulnerabilities in the past discovered by humans in another programming language).

I'm talking specifically about discovery here because transformers lose symbolic inference, and that's why you can't use them for exploit generation. At least I wasn't able to make them work for the DARPA challenges, and had to use an AlphaGo based model combined with a CPPN and some techniques that worked in ES/HyperNEAT.

I suppose what I'm trying to say is that there's a missing understanding of memory and time when it comes to LLMs. And that is usually manually encoded/governed how you put it by humans. And I would not count that as an LLM doing it, because you could have just automated the tool use without an LLM and get identical results. (When thinking e.g. about an MCP for kernel memory maps or say, valgrind or AFL etc)


We're talking about different things here. A pentesting agent directly tests running systems. It's a (much) smarter version of Burp Scanner. It's going to find memory disclosure vulnerabilities the same way pentesters do, by stimulus/response testing. You can do code/test fusion to guide stimulus/response, which will make them more efficient, but the limiting factor here isn't whether transformers lose symbolic inference.

Remember, the competition here is against human penetration testers. Humans are extremely lossy testing agents!

If the threshold you're setting is "LLMs can eradicate memory disclosure bugs by statically analyzing codebases to the point of excluding those vulnerabilities as valid propositions", no, of course that isn't going to happen. But nothing on the table today can do that either! That's not the right metric.


> Humans are extremely lossy testing agents!

Ha, I laughed at that one. I suppose you're right :D


With exploits, you'll have to go through the rote stuff of checklisting over and over, until you see aberrations across those checklist and connect the dots.

If that part of the job is automated away. I wonder how the talent and skill for finding those exploits will evolve.


They suck at collecting the bounty money because they can't legally own a bank account.

I am not saying this to be mean, because these feel like good faith questions. But they also sound like questions rooted in a purely logical view of the world, divorced of experience.

That is, I don't believe it is possible that you've had real world experience with alcoholics, because if you had, it would be obvious why it doesn't work the way you are asking about. Some addictions are just too powerful. It is not a matter of having failed to treat the root cause. It's a matter of acknowledging that, for some people, the only solution to alcohol is not to consume any. It doesn't mean they don't also try to treat and understand deeper emotional reasons for their drinking.


There's lots of research that points to that the brain after addiction just isn't the same as before addiction[1][2]. So while there might have been a root cause before, the effects of addiction is still present even if the root cause isn't an issue anymore.

[1]: https://med.stanford.edu/news/insights/2025/08/addiction-sci...

[2]: https://www.rockefeller.edu/news/35742-newly-discovered-brai...


> Approximately 4.6 years of continuous play, every second, to see a single jackpot win.

This seems pretty reasonable, actually! Somehow it makes the 320M seem manageable.



I feel like "technically, no" but "practically, yes".

Somehow the distinction of just adding a tag / using filters doesn't communicate the cultural/process distinction in the same way.


> but ijk I will rail against until I die.

> There's no context in those names to help you understand them, you have to look at the code surrounding it.

Hard disagree. Using "meaningful" index names is a distracting anti-pattern, for the vast majority of loops. The index is a meaningless structural reference -- the standard names allow the programmer to (correctly) gloss over it. To bring the point home, such loops could often (in theory, if not in practice, depending on the language) be rewritten as maps, where the index reference vanishes altogether.


I respectfully disagree.

The issue isn't the names themselves, it's the locality of information. In a 3-deep nested loop, i, j, k forces the reader to maintain a mental stack trace of the entire block. If I have to scroll up to the for clause to remember which dimension k refers to, the abstraction has failed.

Meaningful names like row, col, cell transform structural boilerplate into self-documenting logic. ijk may be standard in math-heavy code, but in most production code bases, optimizing for a 'low-context' reader is not an anti-pattern.


If the loop is so big it's scrollable, sure use row, col, etc.

That was my "vast majority" qualifier.

For most short or medium sized loops, though, renaming "i" to something "meaningful" can harm readability. And I don't buy the defensive programming argument that you should do it anyway because the loop "might grow bigger someday". If it does, you can consider updating the names then. It's not hard -- they're hyper local variables.


In a single-level loop, i is just an offset. I agree that depending on the context (maybe even for the vast majority of for loops like you're suggesting) it's probably fine.

But once you nest three deep (as in the example that kicked off this thread), you're defining a coordinate space. Even in a 10-line block, i, j, k forces the reader to manually map those letters back to their axes. If I see grid[j][i][k], is that a bug or a deliberate transposition? I shouldn't have to look at the for clause to find out.


If you see grid[y][z][x], is it a bug or a deliberate transposition?


This is also the basis for most SaaS purchases by large corporations. The old "Nobody gets fired for choosing IBM."


> the actual market-clearing price of an XSS vulnerability is very low (in most cases, it doesn't exist at all) because there aren't existing business processes those vulnerabilities drop seamlessly into; they're all situational and time-sensitive.

Could you elaborate on this? I don't fully understand the shorthand here.


I'm happy to answer questions but the only thing I could think to respond with here is just a restatement of what I said. I was terse; which part do you want me to expand on? Sorry about that!


> because there aren't existing business processes those vulnerabilities drop seamlessly into; they're all situational and time-sensitive.

what's an example of an existing business process that would make them valuable, just in theory? why do they not exist for xss vulns? why, and in what sense, are they only situational and time-sensitive?

i know you're an expert in this field. i'm not doubting the assertions just trying to understand them better. if i understand you're argument correctly, you're not doubting that the vuln found here could be damaging, only doubting that it could make money for an adversary willing to exploit it?


I can't think of a business process that accepts and monetizes pin-compatible XSS vulnerabilities.

But for RCE, there's lots of them! RCE vulnerabilities slot into CNE implants, botnets, ransomware rigs, and organized identity theft.

The key thing here is that these businesses already exist. There are already people in the market for the vulnerabilities. If you just imagine a new business driven by XSS vulnerabilities, that doesn't create customers, any more than imagining a new kind of cloud service instantly gets you funded for one.


Thank you, makes a lot of sense.

I wonder what you think of this, re: the disparity between the economics you just laid out and the "companies are such fkn misers!" comments that always arise in these threads on bounty payouts...

I've seen first hand how companies devalue investment in security -- after all, it's an insurance policy whose main beneficiaries are their customers. Sure it's also reputational insurance in theory, but what is that compared with showing more profit this quarter, or using the money for growth if you're a startup, etc. Basically, the economic incentives are to foist the risks onto your customers and gamble that a huge incident won't sink you.

I wonder if that background calculus -- which is broadly accurate, imo -- is what rankles people about the low bounty rewards, especially from companies that could afford more?


The premise that "fucking companies are misers" operate on that I don't share is that vulnerabilities are finite and that, in the general case, there's an existential cost to not identifying and fixing them. From decades of vulnerability research work, including (over the past 5 years) as a buyer rather than a seller of that work: put 2 different teams on a project, get 2 different sets of vulnerabilities, with maybe 30-50% overlap. Keep doing that; you'll keep finding stuff.

Seen through that light, bug bounty programs are engineering services, not a security control. A thing generalist developers definitely don't get about high-end bug bounty programs is that they are more about focusing internal resources than they are about generating any particular set of bugs. They're a way of prioritizing triage and hardening work, driven by external incentives.

The idea that Discord is, like, eliminating their XSS risk by bidding for XSS vulnerabilities from bounty hunters; I mean, just, obviously no, right?


How does stealing someone social media accounts not slot into "organized identity theft"?

... actually: how is XSS not a form of RCE? The script is code; it's executed on the victim's machine; it arrives remotely from the untrusted, attacker-controlled source.

And with the legitimate first-party's permissions and access, at that. It has access to things within the browser's sandbox that it probably really shouldn't. Imagine if a bank had used Mintlify or something similar to implement a customer service portal, for example.


You're misreading me. It's organized identity theft driven by pin-compatible RCE exploits. Is there already an identity theft ring powered by Mintlify exploits? No? Then it doesn't matter.

The subtlety here is the difference between people using an exploit (certainly they can) and people who buy exploits for serious money.


A remote code execution bug in ios is valuable - it may take a long time to detect exploitation (potentially years if used carefully), and even after being discovered there is a long tail of devices that take time to update (although less so than on android, or linux run on embedded devices that can’t be updated) That’s why it’s worth millions on the black market and apple will pay you $2 million dollars for it

An XSS is much harder to exploit quietly (the server can log everything), and can be closed immediately 100% with no long tail. At the push of an update the vulnerability is now worth zero. Someone paying to purchase an XSS is probably intending to use it once (with a large blast radius) and get as much as they can from it in the time until it is closed (hours? maybe days?)


Does that mean that opening arbitrary pdfs on your laptop is unsafe?


Let me put it this way...

In one of my penetration testing training classes, in one of the lessons, we generated a malicious PDF file that would give us a shell when the victim opened it in Adobe.

Granted, it relied on a specific bug in the JavaScript engine of Adobe Reader, so unless they're using a version that's 15 years old, it wouldn't work today, but you can't be too cautious. 0-days can always exist.


Yes, opening random pdfs especially in random and old pdf viewers is not a good idea.

If you must open a possibly infected pdf, then do it in browser, pdf.js is considered mostly safe, and updated.


Use the PDF to JPG online services, convenient and you still get your result without having to deal with any sandbox


Except of course that you're sharing the contents of that PDF with a random online service.


True, I just considered that once you handle a PDF with so much care like if it was poisoned, it's perhaps better to send this poison to someone else to handle.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: