Hacker Newsnew | past | comments | ask | show | jobs | submit | saithound's commentslogin

It's pretty clear at this point that Mythos' capability to discover and exploit zero-day vulnerabilities at scale is but an incremental improvement over existing models like the ones available to OpenAI's Plus/Pro subscribers.

Anthropic tries to create marketing hype around Mythos using two psychological tricks.

1. Put large numbers in the headlines.

"Mythos discovered 271 vulnerabilities in Firefox" makes the model seem extremely capable to the uninitiated.

But it's actually meaningless as a measure of capability _improvement_.

Anthropic gave away $100mil specifically as Mythos credits to these projects and companies (that's $2.5mil per project). Spending the same exorbitant amount of compute analyzing the same codebases in an older model like GPT 5.x Pro would have turned up 260 of these vulnerabilities, or could even have turned up more than 271 ones.

No need to speculate, since this is exactly what we saw in the few code bases where we have such comparisons (like in the curl codebase). Supposedly weaker models, working with a much lower budget, turned up dozens of vulnerabilities. Mythos turned up only one, which ended up as a low severity CVE.

2. Do the whole "too dangerous to release" shtick. This is one of Dario Amodei's favorite moves. When he was vice president of research at OpenAI, he declared GPT-3 (which wasn't able to produce coherent text beyond 3-4 sentences at the time) too dangerous [1] as well.

Long story short, it's the ChatGPT 4.5 situation again: a company trained a model that's too slow and expensive, but not much more capable than what came before. It therefore requires these marketing stunts.

[1] https://www.itpro.com/technology/artificial-intelligence-ai/...


I work for a company that has been using Mythos for vulnerability detection in our software. The results we're getting are revolutionary to the point that our software security teams are heavily overloaded addressing the deluge of thousands of real bugs/vulnerabilities and design flaws across our billions of lines of code.

For comparison, we are invested heavily the the AI space to the point where Anthropic is one of our competitors. We were already using state of the art models to find flaws in our code, but Mythos was just so much better at finding real vulnerabilities it's not even funny.


Read the above comment again. Both your comments and his/hers are compatible


They are directly contradicting the claim that if you ran other models on the same codebases you would get similar results.


Yeah I’m a security researcher and my colleagues who have access say it’s insanely good… but interestingly they also work for places like nvidia which have a deep vested interest selling tokens and hardware. So of course they are pushing this narrative.


if you are invested heavily in the AI space, isn't it in your best interest for the froth around Mythos to be true and the comment you are responding to to be invalid? even if you are competing with Anthropic, a rising tide raises all ships

i'd like to see more facts and data one way or another!


This is the "circumstantial" version of the ad hominem fallacy. Just because the author may benefit from the argument being true doesn't mean it is invalid.

They are clearly disputing the assertion the Mythos is an incremental gain rather than quantum leap. Of course objective unbiased data would be nice, but these anecdotes are all we have right now.


> billions of lines of code.

Billions as in 10^9?



>Do the whole "too dangerous to release" shtick.

One aspect that isn't really discussed much in this context is how to wrap one's head around the corporate risk with models of ever increasing capability. It might not be too dangerous to society, but it could be too dangerous to Anthropic.


I couldn't agree more. I think the recent moves to partner with xAI and Amazon are proof that they desperately need more compute and are doing everything possible to get it.


I mean everyone knows they need more compute. That’s not a secret or up for debate at all. They are maybe the fastest growing company in history.


I'm fairly certain Amodei believes the "too dangerous to release" hype himself. Even if it's just an incremental improvement, better than getting frog-boiled by repeated 20% improvements until someone builds bioweapons in their backyard.


He's made so many statements that fall under the "boy who cried wolf" category that even if he _does_ believe these statements he needs to be managed better. I'll never forget Anthropic's huge "Oh my God, the AI blackmailed a researcher to save itself!" and the prompt effectively told the AI to do that and gave it forged emails with easy blackmail targets, as if this isn't a common trope in mystery or suspense books/television/fanfiction, all of which Claude (and others) have been trained on.


It's a common trope, all through the training data, and all the modern AIs have read it, and would probably act similarly? Is that what we should take away from your comment? so we have nothing to worry about. Makes sense. Really, it's just a common trope.


Oh of course wolves have sharp teeth, they're predators. Anyone know knows this can never be bitten.


I'm saying the existence of the trope, within the training data, and the experimental setup, negate the breathless "Oh my god it did something unexpected in order to preserve itself!" as if an LLM has any sense of identity or self.

Many, many other bad things are in the training data. For an example of how this can manifest bad things that people don't seem to be discussing too much check out the recent Behind the Bastards episodes about how an AI Chatbot became a Cult Leader (The title is an exaggeration that the host explains while raising some excellent points about how LLMs have ingested a lot of cult leader material and can therefore mimic those speech patterns and impact people vulnerable to such things)


Imagine you're in a car and the car is driving towards a cliff. You shout at the driver "oh my god we're about to go over a cliff!" And he says "you said that two seconds ago, but we're still alive, you're just like the boy who cried wolf. Do you know exactly when we're going to go over a cliff? No? Maybe you're imagining the cliff."

I think it's very improbable that AI is as dangerous as Yud et al fear it is. But it's too soon to say and there seems to be significant long-tail risk. Mocking or criticizing people for being concerned about that risk seems counterproductive.

Seems like the life cycle of huge tech companies like meta, Google, Microsoft, Amazon is "do whatever's necessary to take over the world, then enshittify." I don't take it for granted that Amodei and Anthropic seem to not quite be maximally power hungry?

Re: second half of your comment. Understanding a threat doesn't neutralize it. Anthropic didn't make that big a deal of it either; it was news articles that blew it out of proportion.


* sigh *

Three things:

* Delaying the release accomplishes nothing.

* The barrier to someone building/not-building a bioweapon in their backyard is not access to an LLM.

* Remember when GPT 3.5 was going to destroy the world? And how it was conscious? And how it was "trying to escape"? Lmao.


I think gpt 3.5 might have destroyed the world


How does delaying the release not solve anything? It puts everyone on a notice to fix all security vulnerabilities now


Because the only thing keeping those vulnerabilities in existence was laziness.


"laziness" is an interesting reframing of "rational cost-benefit analysis and the limits of the human mind".


You're right, it's silly for me to worry. We've never had a technology that initially appeared benign but turned into a big problem. In fact, no tech company has ever released technologies that cause problems for the rest of society AT ALL. /s

What are the other barriers? Last I checked access to CRISPR is not especially tightly regulated. Even if it is, defense in depth is a thing.


If it was as easy as "knowing how to" someone would've already done it or at least attempted to.*

Plenty of people know how to, 10,000s of researchers, perhaps you know someone who does.

Did you know that your local veterinary shop has enough drugs to kill 100s of people?

Why doesn't it happen?

* It's not that easy.

* There's a ton of regulation that is hard to circumvent, on purpose.

* There's a gigantic deterrent called "spend the rest of your life behind bars" that people tend to avoid.

An LLM, even the most advanced one, does not make any material change in any of these. You cannot bullshit your way into "uhh, I need Ebola samples for ... reasons".

Unironically, your Sunday movie portraying a super-villain jeopardizing a city with his "home lab" full of flasks with colored liquids and BioHazard signs push way more people into becoming interested on this than having access to an LLM.

*: Okay, like 5 people, and way before LLMs were a thing. This has been a thing for decades, we're fine.


CRISPR has not been a thing for decades. Biotechnology is advancing and AI is lowering the bar to use it. In 2018 a PhD student was able to synthesize an infectious horsepox virus: https://journals.plos.org/plosone/article?id=10.1371/journal...

So far the overlap between people with bioengineering capabilities and murderous tendencies has been very low. As the technology becomes available to more people that overlap may increase. Even if it never comes within reach of one person, what about North Korea, or Iran?

AI can be jailbroken. The LLM safeguards your argument relies on were put in place by the people you're criticizing for being too safety-conscious. Security through obscurity is no guarantee.


>So far the overlap between people with bioengineering capabilities and murderous tendencies has been very low.

Source for that?

>Even if it never comes within reach of one person, what about North Korea, or Iran?

Oh great, the xenophobic argument, we were missing that one in the conversation.

>Security through obscurity is no guarantee.

Exactly my point! I'm glad we can agree on that :).


You're not arguing in good faith, if you ever were. You're just trying to score points rather than actually address the core of my argument. I hope that's because I've made you think a little, but it doesn't really matter.


Also, slightly stretching the definition of terms consecutively, so the multiplicative meaning is really far from the truth. For example, 271 vulnerabilities were really mostly bugs - generally incorrect states, but which almost never led to any exploit.


Yes, an AI making massive gains in bug finding is hugely important and good, it may even lead to a net neutral with the amount of bugs introduced by other AI coding processes, but it’s a far cry from how mythos is portrayed most of the time: a automatic super hacker.


But I think that's a problem with the people portraying it that way, not with Anthropic's messaging. If you've invented "just" a massively more powerful bug finder, it still seems right that you ought to let banks and critical infrastructure providers run it on their systems before it gets in the hands of people who might want to hack them.


You're not really responding to the piece at all.


It's an AI-written slop article, which is hugged to death by HN in any case.

It claims to be an evidence-based investigation, but basically invents the contents of the documents they supposedly investigated, such as the Anthropic Frontier Red Team writeup, from whole cloth.

I don't think deeper engagement with it would promote good discussion.


So you say. I actually read the piece and didn't get AI vibes from it all, except for the graphics


there are 31 emdashes in that piece. the domain ends with _ai_


It’s a tangent but two points:

First, the reason LLMs learned to like em dashes is that they are common in the training corpus - they are a thing before LLMs that LLMs have learned, not invented?

Second, work browser has nice blue swiggles under everything I write into a textbox. I dutifully click through them and accept the rephrasing suggestions. I get a lot of em dashes. My blog posts and whitepapers and stuff are full of them and other “AI tells” - but I think they read better because of it.


I use emdashes all the time. They're correct punctuation as opposed to a minus sign. They're easy to type too: opt-shift-minus. If they were such a huge giveaway without ever being used by humans, models would be trained by now not to use them as much.

The blog is about AI. So yeah the TLD is .ai


I've never seen writing created before the advent of LLMs that used emdashes in the same way and with the same frequency that LLMs regularly do. There's probably some out there but it would be a real outlier. LLMs overuse them to an absurd degree, putting them where most writers would put commas, occasionally semi-colons, or nothing at all.

I count 51 em-dashes on the page, which is extreme. They're also used in places where they don't really belong. It's very obviously LLM-generated, at least in part.

That said, it puzzles me why people don't prompt LLMs to change up the writing style a bit and remove some of the tells.


I can't imagine why a system designed to reproduce the best writing styles would frequently use em dashes.


Take another look at this blog's index https://kingy.ai/category/blog/ and click through more posts, and pay attention to the post dates.

Do you really think this singular author is writing multiple excessively-long blog posts about AI per day? There are ~650 of these posts over the past 18 months. And over on LinkedIn, the author describes himself as a "Specialist in Digital Marketing, Videography / Video Editing, Search Engine Optimization, Social Media, and B2B Sales."

YMMV but this post and entire site absolutely screams "slop" to me.


Don't bother with the slop lovers, these people are anti-human in their souls and willing to follow the most evil people on Earth to the depths of hell; for what? I have zero idea but it's sad to see.


I hate slop as much as you do. Your comment makes no sense.


> It's pretty clear at this point that Mythos' capability to discover and exploit zero-day vulnerabilities at scale is but an incremental improvement over existing models like ChatGPT Plus/Pro.

I'm skeptical of AI takes by someone who thinks there's a model called chatgpt plus. Spend more time working with the current systems!


It seems like everybody (including you) knew precisely what I meant: the models available for ChatGPT Plus or Pro subscribers, i.e. GPT-5.5 Thinking Extended and the latest Pro. I've edited the offending sentence for clarity just in case.

If I got you to be skeptical of AI takes, though, mission accomplished. Exercise your skepticism especially when the takes come from somebody who is trying to sell something.


I find it interesting that Mythos was announced the same day that GLM overtook opus4.6 in capability. To me this seems like a careful attempt to cool demand for opensource models which are about to take the overall lead.


It's remarkable how capable GLM 5.1 is, what's amazing is the recent development of Qwen 3.6 27B being close in real world performance.


I don't get it. If the older / smaller models are almost as good as Mythos, that sounds like the opposite of comforting.


> an incremental improvement

I've had to reboot my systems quite a bit more than an incremental improvement would suggest this week


It's a fun, but unsurprising undergrad-level result. It got picked up and overhyped on HN [1] and /r/math [2] earlier this week.

Some of my favorites:

DoctorOetker: "I'm still reading this, but if this checks out, this is one of the most significant discoveries in years."

cryptonektor: "Given this amazing work, an efficient EML operator HW implementation could revolutionize a bunch of things."

zephen: "This is about continuous math, not ones and zeroes. Assuming peer review proves it out, this is outstanding."

[1] https://news.ycombinator.com/item?id=47746610

[2] https://www.reddit.com/r/math/comments/1sk63n5/all_elementar...


:)

I still consider the article important, as it demonstrates techniques to conduct searches, and emphasizes the very early stage of the research (establishes non-uniqueness for example), openly wonders which other binary operators exist and which would have more desirable properties, etc.

Sometimes articles are important not for their immediate result, but for the tools and techniques developed to solve (often artificial or constrained) problems. The history of mathematics is filled with mathematicians studying at-the-time-rather-useless-constructions which centuries or millennia later become profound to human interaction. Think of the "value" of Euclid's greatest common divisor algorithm. What starts out as a curiosity with 0 immediate relevance for society, is now routinely used by everyone who enjoys the world wide web without their government or others MitM'ing a webpage.

If the result was the main claimed importance for the article, there would be more emphasis on it than on the methodology used to find and verify candidates, but the emphasis throughout the article is on the methodology.

It is far from obvious that the tricks used would have converged at all. Before this result, a lot of people would have been skeptical that it is even possible to do search candidates this way. While the gradual early-out tightening in verification could speed up the results, many might have argued that the approach to be used doesn't contain an assurance that the false positive rate wouldn't be excessively high (i.e. many would have said "verifying candidates does not ensure finding a solution, reality may turn out that 99.99999999999999999% of candidates turn out not to pass deeper inspection").

It is certainly noteworthy to publish these results as they establish the machinery for automated search of such operations.


This result itself is being described in those terms[1]:

> If this is true, then this blog post debunking EML is going to up-end all of mathematics for the next century.

This is very concerning for mathematics in general.

1: https://news.ycombinator.com/item?id=47775105


Why on earth would it upend all of mathematics? Secondly, even if it did that, why would that be concerning for mathematics?


Yes or no, it's too early to tell.


The original article explicitly acknowledged this limitation, that while in "the classical differential-algebraic setting, one often works with a broader notion of elementary function, defined relative to a chosen field of constants and allowing algebraic adjunctions, i.e., adjoining roots of polynomial equations," the author works with the less general definition.

Neither the present article, nor the original one has much mathematical originality, though: Odrzywolek's result is immediately obvious, while this blog post is a rehash of Arnold's proof of the unsolvability of the quintic.


Yes, this article is kicking in open doors, the original article was quite clear about the scope.

The present article could rather have spent time arguing why this isn't like NAND gate functional completeness.

I would have thought the differences lie in the other direction: not that trees of EML and 1 can describe too little, but that they can describe too much already. It's decidable whether two NAND circuits implement the same function, I'm pretty sure it's not decidable if two EML trees describe the same function.


You are correct, it is undecidable by Richardson's theorem [1].

[1] https://en.wikipedia.org/wiki/Richardson%27s_theorem


that result does not apply for EML: EML doesn't have the | . | absolute value function, a prerequisite for Richardson's theorem.


Yes it does; you can build the absolute value as sqrt(x²), and sqrt(x) and x² are both constructible using eml.


If I understand the page correctly, the extension by Miklós Laczkovich should be enough to show that it's undecidable.


You wrote:

> It's decidable whether two NAND circuits implement the same function, I'm pretty sure it's not decidable if two EML trees describe the same function.

Perhaps, perhaps not, same function so basically is this question solvable:

A(x[,y,...]) = f(x[,y,...])-g(x[,y,...]) == 0 everywhere?

if a user brings EML functions f and g; given their binary EML trees; can we decide if they represent the same function, so the question form is

A(x)=0 EVERYWHERE?

(like given 2 fractions a/b == c/d ? do the fractions represent the same fraction?)

From Wikipedia link reikonomusha gave:

> Miklós Laczkovich removed also the need for π and reduced the use of composition.[5] In particular, given an expression A(x) in the ring generated by the integers, x, sin xn, and sin(x sin xn) (for n ranging over positive integers), both the question of whether A(x) > 0 for some x and whether A(x) = 0 for some x are unsolvable.

Here the question forms are

1) exist x such that A(x) > 0 (does there exist an x where A(x) becomes positive?)

2) exist x such that A(x) = 0 (does there exist a value such that A(x) becomes 0? or basically find real roots

so at least the forms on WikiPedia don't generate the results both of you claim it does.

it does present undecidability results, but not straightforwardly in the context of this EML work.

second the Richardson's theorem is about the function on the reals, not complex functions (I mean the roots must lay somewhere)


> an expression in the ring generated by the integers, x, sin xn, and sin(x sin xn)

We can always write AML trees for expressions generated by the integers, x, sin xn, and sin(x sin xn), right?

So we should be able to write EML trees for any two such expressions, A and B. If they're equal everywhere, then A - B = 0 everywhere. A - B is also in the aforementioned ring.

If there was a decision procedure always to determine if EML trees represent the same function, then that contradicts Miklós Laczkovich's extension, right?


no Miklós Laczkovich's extension as described on wikipedia only says that both of the following questions are proven undecidable:

1) is there some value x such that some function F(x)=A(x)-B(x)=0?

2) is there some value x such that F(x)>0?

while you asked:

> I'm pretty sure it's not decidable if two EML trees describe the same function.

that would be

3) is for every x F(x)=A(x)-B(x)==0?

which Miklós Laczkovich's extension does not provide.

And you ignore the fact that Miklós Laczkovich's extension applies to real numbers and functions...


If it's undecidable whether it's 0 at even ONE point, clearly you can't prove that it's 0 everywhere.

Likewise, if it's not decidable for real-valued functions, clearly it's not decidable for complex valued functions.


thats not how this works

decidability does not distribute over pointwise question asking on sets, or if you believe it does, show us the proof.

Telling if an EML(x,y),1 constructed expression is identically 0 is in the gray zone, as far as I can tell, it has neither been proven decidable nor been proven undecidable.

Nevertheless regardless of decidability the authors clearly show the multipoint sampling/testing is a decent filter, and the shorter resulting expressions have been proven correct in the results for the construction at least.


It has more problems. See my other comment in this thread.


> It's decidable whether two NAND circuits implement the same function

Well, sure. At least, until you have a loop that starts clocking for you, and now you've got the halting problem.


> Neither the present article, nor the original one has much mathematical originality, though: Odrzywolek's result is immediately obvious, [...]

Maybe. But I found it a nice piece of recreational mathematics nevertheless.


Arnold (as reported by Goldmakher [1]) does prove the unsolvability of the quintic in finite terms of arithmetic and single-valued continuous functions (which does not include the complex logarithm). TFA's result is stronger, which is something about the solvability of the monodromy groups of all EML-derived functions. So it doesn't seem to be a "rehash", even if their specific counterexample could have been achieved either in fewer steps or with less machinery.

[1] https://web.williams.edu/Mathematics/lg5/394/ArnoldQuintic.p...


Arnold's proof can be used to show that certain classes of functions are insufficient to express a quintic formula.

These classes can always safely include all single-valued continuous functions (you cannot even write the _quadratic_ formula in terms of arithmetic and single-valued continuous functions!), but also plenty of non-single-valued functions (e.g. the +-sqrt function which appears in the well-known quadratic formula).

Applying Arnold's proof to the class given by arithmetic and all complex nth root functions (also multivalued) gives the usual Abel-Ruffini theorem. But Arnold's proof applies to the class "all elm-expressible functions" without modification.


> Odrzywolek's result is immediately obvious

Many things that in retrospect seem immediately obvious weren't obvious before, let alone immediately obvious.


and its depressing when the rare actual progress is made, a collection of jealous practitioners comes to party-poop all over the place, for bringing the insights that make the result from then on immediately obvious.


> Odrzywolek's result is immediately obvious

This may or may not be true; but the burden of proof should not lay with the reader.

Please provide (in absence of which every reader can draw their own conclusions) a reference which simultaneously:

1) predates Odrzywolek's result

2) and demonstrates the other unary and binary operations typically tacitly assumed can be expressed in terms of a single binary operation and a constant.

(in other news: I can spontaneously levitate, I just don't feel like demonstrating it to you right now...)


Questions which have never been asked or answered before, but to which practitioners have immediately obvious answers, are dime a dozen in mathematics.

You can find thousands of such questions on Math StackExchange. Take e.g. [1]: never been asked anywhere else, interesting enough, yet answered pretty much immediately by two separate mathematicians.

"Is there a single constant and function with connected domain that can express all of $\log, \exp, \sin, \dots$?" would have made a fine question there too, the type that gets a thorough answer very quickly if anyone bothers to ask it.

> the burden of proof should not lay with the reader

You were the one who made the claim that "this is one of the most significant discoveries in years". Feel free to substantiate that claim first, according to the same standards. Are there any authors who ask this question, and/or suggest that they don't know an answer?

[1] https://math.stackexchange.com/questions/2308587/is-the-set-...


His derivations are genuinely pretty, but it isn’t a result when everything he claims is actually just wrong.


Upvoted back to not-greyed-out. You must have struck a nerve.


> Repeating myself, when we speak of bugs in a verified software system, I think it's fair to consider the entire binary a fair target.

Yes, and that would be relevant if this was a verified software system. But it wasn't: the system consisted of a verified X and unverified Y, and there were issues in the unverified Y.

The article explicitly acknowledges this: "The two bugs that were found both sat outside the boundary of what the proofs cover."


the good news I guess are

1/ lean-zip is open source so it's much easier to have more Claude's eyes looking at it

2/ I don't think Claude could prove anything substantial about the zip algorithm. That's what lean is for. On the other side, lean could not prove much about what's around the zip algorithm but Claude can be useful there.

So in the end lean-zip is now stronger!


MS Flight Simulator w/ VATSIM [1] l has this, in the sense thar you can participate as a pilot or a controller, although you are not assigned these roles at game start.

Anti-griefing works by keeping the barriers to entry very high, so chances are you won't try VATSIM, even though MSFS is technically available on Steam.

[1] https://vatsim.net/docs/basics/becoming-a-controller


The thread started out off the rails. Contrary to the claims of youre-wrong3, garbage collection is not a particularly high paying job and has no real trouble getting new hires.


They already had that exact strategy between 2012 and 2020.


> When asking people to write code in a language, these restrictions could be onerous. But LLMs don't care, and the less expressivity you trust them with, the better.

But LLMs very much do care. They are measurably worse when writing code in languages with non-standard or non-existent operator precedence. This is not surprising given how they learn programmming.


This is irrelevant when the question is whether marketing your CPU with "AI" will help sales.

Toilets also changed everything we do and are helpful in unobtrusive ways, but that won't make the "Ryzen Crapper" a customer favorite.


> I have noticed that only white people commit to living in the UK without becoming citizens.

Alas, you've not discovered a hidden pattern, except maybe a hidden pattern in the kinds of people you socialize with. Chinese nationals cannot hold dual citizenship, and renouncing their Chinese citizenship creates very serious complications, including around property and inheritance when parents die, which you would be aware of if you knew any Chinese person well enough to have had this conversation with them.

Based on gov.uk immigration system statistics data and tables, among those with indefinite leave to remain, the most likely to seek citizenship are British Overseas Citizens, Austrians and Lithuanians. The least likely are Moroccans and Venezuelans.


> The least likely are Moroccans and Venezuelans

Weird in the Venezuelans case, as there is no restrictions for double nationality and having only Venezuelan citizenship doesn't have many advantages. I would guess that it is because most Venezuelans living there already have an European passport due to parents/grandparent, so no need/can't get a third


Cannot only if the other European country does not allow dual nationality.

If they are permanently settled in the UK surely it would be better to have British citizenship rather than that of a country they do not live in?

The nationalities listed are all very small groups in the UK. maybe they are not really permanently settled? Someone who moves somewhere for work might end up living there a decade (and in the UK that would mean getting indefinite leave to remain) and then returning.


I admit not knowing a lot of Chinese nationals, but I do know a very wide range of people. Of course, issues with property and inheritance only apply to people who have sufficient for it to be a major issue.

Could you link to the stats showing that? What about all the other countries? What is the position of BNOs?

Indefinite leave to remain is not the same as permanently settled - there is a difference between long term and the rest of your life.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: