More

stri8ted · 2026-02-24T00:25:22 1771892722

How is this content related to HN? Are there any submission criteria?

stri8ted · 2026-02-19T17:18:09 1771521489

Exactly. As far as I'm concerned, the benchmark is useless. It's way too easy and rewarding to train on it.

bonoboTP · 2026-02-19T19:33:31 1771529611

It's just an in-joke, he doesn't intend it as a serious benchmark anymore. I think it's funny.

Legend2440 · 2026-02-19T17:44:33 1771523073

Y'all are way too skeptical, no matter what cool thing AI does you'll make up an excuse for how they must somehow be cheating.

toraway · 2026-02-19T19:14:56 1771528496

Jeff Dean literally featured it in a tweet announcing the model. Personally it feels absurd to believe they've put absolutely no thought into optimizing this type of SVG output given the disproportionate amount of attention devoted to a specific test for 1 yr+.

I wouldn't really even call it "cheating" since it has improved models' ability to generate artistic SVG imagery more broadly but the days of this being an effective way to evaluate a model's "interdisciplinary" visual reasoning abilities have long since passed, IMO.

It's become yet another example in the ever growing list of benchmaxxed targets whose original purpose was defeated by teaching to the test.

https://x.com/jeffdean/status/2024525132266688757?s=46&t=ZjF...

arcatech · 2026-02-19T18:49:56 1771526996

Or maybe you’re too trusting of companies who have already proven to not be trustworthy?

pixl97 · 2026-02-19T17:46:09 1771523169

I mean if you want to make your own benchmark, simply don't make it public and don't do it often. If your salamander on skis or whatever gets better with time it likely has nothing to do with being benchmaxxed.

stri8ted · 2026-02-05T20:11:31 1770322291

That is where the money is.

ghosty141 · 2026-02-05T20:41:18 1770324078

This. I think software development is the best usecase for AI yet. I use it almost daily at work and it's a huge help.

Enterprise customers will happily pay even 100$/mo subscriptions and it has a clear value proposition that can be decently verified.

OutOfHere · 2026-02-06T16:18:12 1770394692

Revenue should not be confused with profit. The large AI companies must easily be spending more on compute than they're making from a $20-200/mo subscription. In the best case it might break even for the AI companies. There is no way that they're actually earning a profit from these subscriptions at this time.

OutOfHere · 2026-02-05T22:47:19 1770331639

It's where the revenue is, but it isn't going to be where the profit is. Developers will easily use absurdly large amounts of compute, costing the AI provider a lot more than they receive in revenue.

stri8ted · 2025-12-20T19:03:46 1766257426

What languages does it support? I can't find this info anywhere on the page.

stri8ted · 2025-10-06T14:03:51 1759759431

For video use cases, which will become increasingly popular, we are a long ways away.

echelon · 2025-10-06T16:54:07 1759769647

Wan runs on local GPUs and looks amazing.

Sora 2 takes a lot of visual shortcuts. The innovation is how it does the story planning, vocals, music, and lipsync.

We'll have that locally in 6 months.

stri8ted · 2025-08-15T21:18:18 1755292698

Exactly. Or use the interpretability work to disable the distress neuron.

stri8ted · 2025-06-12T19:13:27 1749755607

The same is true for Google.

stri8ted · on Jan 26, 2025

It seems a strong base model is what enabled this. The models needs to be smart enough to get it right at least some times.

stri8ted · on Nov 4, 2024

Not for coding, it seems. https://aider.chat/docs/leaderboards/

stri8ted · on Sept 17, 2024

Given Israel's successful precision targeting of various senior Hezb members in recent months, I wonder if the pagers were initially used as such, but as suspicion mounted, and chances of an overhaul increased, they decided to hit the kill switch while they still could.

Although as as per an WSJ article: "The affected pagers were from a new shipment that the group received in recent days"

LegitShady · on Sept 17, 2024

The pagers were likely one way with a codebook for the purpose of minimizing tracking and information exposure.