Alright folks.. To qualify myself. I am a vulnerability Researcher @ MIT. My day...

tptacek · 2025-06-05T18:28:01 1749148081

I'm a vuln researcher too, and we just had an article here about another vuln researcher using o3 to find a zero-day remote Linux kernel vulnerability. And not in an especially human-directed way: they literally set up 100 runs of o3, using the 'simonw `llm` tool, and sifted through the results.

I'm having trouble reconciling what you wrote here with that result. Also with my own experiences, not necessarily of finding kernel vulnerabilities (I haven't had any need to do that for the last couple years), but of rapidly comprehending and analyzing kernel code (which I do need to do), and realizing how potent that ability would have been on projects 10 years ago.

I think you're wrong about this.

jdefr89 · 2025-06-05T19:26:51 1749151611

I might be. Deepsleep also sort of found a bug, but you need to ask yourself… is it doing it better than tools we already have? Could a fuzzer have found that bug in less time? How far along did it really need to be pushed and also.. I have no doubts it probably trained on certain types of bugs for certain specific code bases.. Did they test its ability to find the same bug after applying a couple transforms that trip up the LLM? Can you link me to this article about o3? I have my doubts. I’d love to see the working exploit…

Also if you throw these models at enough code bases, they will probably get lucky a couple times.. So far every claim I have seen didn’t stand up to rigorous scrutiny. People find one bug then inflate their findings and write articles that would make you think they are far more affective than reality and I am tired of this hype.

CURL had to stop accepting bounties after it found nearly all of em were just AI generated nonsense…

Also I stated that they indeed provide very large gains in certain areas. Like writing a fuzz harness and reversing binaries. I am not saying they have absolutely no utility I am simply tired of grifters attempting to inflate their findings for clout. Shit has gotten out of control.

tptacek · 2025-06-05T19:39:52 1749152392

But that's exactly what people were saying about fuzzer farms in the mid-2000s, in the belief that artisanal audits would always be the dominant means of uncovering bugs. The truth was somewhere in between (it's still humans, but working at a higher layer of abstraction than they were before) but the fuzzer people were hugely right.

If you can reliably get x% lucky finding vulnerabilities for Y$ cost, then you simply scale that up to find more vulnerabilities.

jdefr89 · 2025-06-05T19:49:51 1749152991

I don’t recall anyone saying anything of the sort back then about fuzzing? Back then you could run the most basic fuzzer and find tons of bugs! Where did you see people complaining about fuzzers??

tptacek · 2025-06-05T21:23:58 1749158638

If you go digging through the blogosphere of the time you'll turn it up. I feel like this is ~2006?

jdefr89 · 2025-06-05T21:27:56 1749158876

Bro in 2006 there wasn't any blogosphere. You had IRC.. I can’t find anything of the sort and do you have a link to this discovery that you say was made via LLM?

tptacek · 2025-06-05T21:50:03 1749160203

Bro, I've been a practitioner since 1996 and ran a fucking security blog in 2006.

jdefr89 · 2025-06-06T11:56:19 1749210979

Unless the blog was Phrack, you shouldn't be surprised I never heard of your blog unless it is something you think I would know.. This stuff didn't have such a public community back then like it does now..

tptacek · 2025-06-06T16:08:01 1749226081

I don't understand what you're trying to say here. I'm not surprised you didn't know about the blog; you appear not to believe blogs existed in 2006.

I want to stop being elliptical and just say directly: you have strange and counterfactual ideas about the security research community of the mid-aughts. 2006-2008 was the height of the security blogosphere. This stuff definitely had a huge community.