Alright folks.. To qualify myself. I am a vulnerability Researcher @ MIT. My day to day research concerns embedded hardware/software security. Some of my current/past endeavors involve AI/ML integration and understanding just how useful it actually is for finding/exploiting vulnerabilities. Just last week my lab hosted a conference that included MIT folks and the outsiders we invite. One talk was on the current state of AI/LLM. To keep things short, this article is sensationalized and overstates the utility of AI/ML on finding actual novel vulnerabilities. As it currently stands, LLMs cannot even reliably find bugs other less sophisticated tools could have found in much less time. Binary Exploitation is a great place for illustrating the wall you’ll hit using LLMs hoping for a 0day. While LLMs can help with things like setting up fuzzers or maybe giving you a place to start manual analysis, their utility kind of stops there. They cannot reliably catch memory corruption bugs that a basic fuzzers or sanitizers could have found within seconds. This makes sense for that class of bugs. LLMs are fuzzy logic and these issues aren’t reliably found with that paradigm. That’s the whole reason we have fuzzers; they find subtle bugs worth triaging. You’ve seen how well LLMs count, it’s no surprise they might miss many of the same things a humans would but fuzzers wouldn’t (think UaF, OOB, etc). All the other tools you see written for script kiddies yield the same amount of false positives they could have gotten with other tools that already exist.. I can go on and on but I am on shuttle, typing on small phone. TLDR: Article is trying to say LLMs are super hackers already and that’s simply false. They definitely have allure for script kiddies. In the future this might change. LLMs time saving aspects are definitely worth checking out for static binary analysis. Binary Ninja with Sidekick saves a lot of time! But again. You still need to double check important things!
I'm a vuln researcher too, and we just had an article here about another vuln researcher using o3 to find a zero-day remote Linux kernel vulnerability. And not in an especially human-directed way: they literally set up 100 runs of o3, using the 'simonw `llm` tool, and sifted through the results.
I'm having trouble reconciling what you wrote here with that result. Also with my own experiences, not necessarily of finding kernel vulnerabilities (I haven't had any need to do that for the last couple years), but of rapidly comprehending and analyzing kernel code (which I do need to do), and realizing how potent that ability would have been on projects 10 years ago.
I might be. Deepsleep also sort of found a bug, but you need to ask yourself… is it doing it better than tools we already have? Could a fuzzer have found that bug in less time? How far along did it really need to be pushed and also.. I have no doubts it probably trained on certain types of bugs for certain specific code bases.. Did they test its ability to find the same bug after applying a couple transforms that trip up the LLM? Can you link me to this article about o3? I have my doubts. I’d love to see the working exploit…
Also if you throw these models at enough code bases, they will probably get lucky a couple times.. So far every claim I have seen didn’t stand up to rigorous scrutiny. People find one bug then inflate their findings and write articles that would make you think they are far more affective than reality and I am tired of this hype.
CURL had to stop accepting bounties after it found nearly all of em were just AI generated nonsense…
Also I stated that they indeed provide very large gains in certain areas. Like writing a fuzz harness and reversing binaries. I am not saying they have absolutely no utility I am simply tired of grifters attempting to inflate their findings for clout. Shit has gotten out of control.
But that's exactly what people were saying about fuzzer farms in the mid-2000s, in the belief that artisanal audits would always be the dominant means of uncovering bugs. The truth was somewhere in between (it's still humans, but working at a higher layer of abstraction than they were before) but the fuzzer people were hugely right.
If you can reliably get x% lucky finding vulnerabilities for Y$ cost, then you simply scale that up to find more vulnerabilities.
I don’t recall anyone saying anything of the sort back then about fuzzing? Back then you could run the most basic fuzzer and find tons of bugs! Where did you see people complaining about fuzzers??
Bro in 2006 there wasn't any blogosphere. You had IRC.. I can’t find anything of the sort and do you have a link to this discovery that you say was made via LLM?
Unless the blog was Phrack, you shouldn't be surprised I never heard of your blog unless it is something you think I would know.. This stuff didn't have such a public community back then like it does now..
I don't understand what you're trying to say here. I'm not surprised you didn't know about the blog; you appear not to believe blogs existed in 2006.
I want to stop being elliptical and just say directly: you have strange and counterfactual ideas about the security research community of the mid-aughts. 2006-2008 was the height of the security blogosphere. This stuff definitely had a huge community.