More

Topfi · 2025-10-22T13:31:51 1761139911

As a former AVP owner, this seems much more suitable for what I personally tend to use these devices for, with the headstrap appearing far closer to my personal comfort preference (halo straps) and a lower weight due to reduced power and no superfluous outer display along lighter materials. I get that Apple included these as apparently they have/had the vision of establishing the AVP as a device which users may keep on when talking with others, but like with the Watch, I'd hope having seen how people use the device makes them change direction.

As there is still no support for MacOS apps like XCode on the AVP M5 and gaming is hardly viable currently without controllers, the performance advantage of the M5/M2 over the XR2 Gen2 isn't really a factor for me. Doesn't matter what the face computer can theoretically do if for many tasks I need to connect to my MBP anyways. As a display replacement, the Galaxy looks far more appealing because of this, though I'll wait for independent reviews on wireless latency, eye tracking, stability of AR "widgets" in space, etc. Also am interested, will Meta port their games (Beatsaber, mainly) to Android XR or will they continue trying to establish themselves as the platform owners. Remember a lot of talk about Meta Horizon being used by partners like Lenovo and Asus, has yet to materialize and have a hard time seeing these going for Meta over Google now that XR has arrived.

Also wish Samsung had distributed at least part of the battery into the back of their head strap. Prefer the Meta Pro style over external batteries any day, though get the size limitations. A small backup battery for hot swapping and added weight distribution would have been a decent compromise though in my eyes.

Topfi · 2025-10-19T14:37:15 1760884635

> AIME is saturated (with tool use) [...]

But isn't tool use kinda the crux here?

Correct me if I'm mistaken, but wasn't the argument back then on whether LLMs could solve maths problems without e.g. writing python to solve? Cause when "Sparks of AGI" came out in March, prompting gpt-3.5-turbo to code solutions to assist solving maths problems over just solving them directly was already established and seemed like the path forward. Heck, it is still the way to go, despite major advancements.

Given that, was he truly mistaken on his assertions regarding LLMs solving maths? Same for "planning".

NitpickLawyer · 2025-10-19T14:49:33 1760885373

AIME was saturated with tool use (i.e. 99%) for SotA models, but pure NL, no tool still perform "unreasonably well" on the task. Not 100% but still within 90%. And with lots of compute it can reach 99% as well, apparently [1] (@512 rollouts, but still)

[1] - https://arxiv.org/pdf/2508.15260

Topfi · 2025-10-19T14:15:15 1760883315

Guess even the CTO of OpenAI relies on Anthropic models in a pinch...

Topfi · 2025-10-19T14:14:18 1760883258

Reminds me of an ARG [0] I made in the early days of LLM hype. Honestly had three mails asking whether they could invest. Likely scams if we are honest, just some automated crawlers, but found it funny nonetheless.

[0] https://ethical-ai.eu

Topfi · 2025-10-19T14:07:43 1760882863

If holding the CTO of OpenAI accountable for his wildly inaccurate statement constitutes "dunking on OpenAI", then I'd say dunk away.

He, more than anyone else, should be able to for one parse the original statements correctly and for another maybe realize that if one of their models had accomplished what he seemed to think GPT-5 had, that may require some more scrutiny and research before posting it. That would have, after all, been a clear and incredibly massive development for the space, something the CTO of OpenAI should recognize instantly.

The amount of people that told me this is clear and indisputable proof that AGI/ASI/whatever is either around the corner or already here is far more than zero and arguing against their misunderstanding was made all the more challenging because "the CTO of OpenAI knows more than you" is quite a solid appeal to authority.

I'd recommend maybe a waiting period of 48h before any authority in any field can send a tweet, that might resolve some of the inaccuracies and the incredibly annoying need to just jump on wild bandwagons...

Topfi · 2025-10-19T13:26:33 1760880393

> What good is your open problem set if really its a trivial "google search" away from being solved. Why are they not catching any blame here?

They are a community run database, not the sole arbiter and source of this information. We learned the most basic research back in highschool, I'd hope researchers from top institutions now working for one of the biggest frontier labs can do the same prior to making a claim, but microblogging has and continues to be a blight on any accurate information so nothing new there.

> GPT-5 was still doing some cognitive lifting to piece it together.

Cognitive lifting? It's a model, not a person, but besides that fact, this was already published literature. Handy that a LLM can be a slightly better search, but calling claims of "solving maths problems" out as irresponsible and inaccurate is the only right choice in this case.

> If a human would have done this by hand it would have made news [...]

"Researcher does basic literature review" isn't news in this or any other scenario. If we did a press release every journal club, there wouldn't be enough time to print a single page advert.

> [...] how many other solutions are out there that just need pieced together from pre-existing research [...]

I am not certain you actually looked into the model output or why this was such an embarrassment.

> But, you know, AI Bad.

AI hype very bad. AI anthropomorphism even worse.

Topfi · 2025-10-19T13:16:28 1760879788

Isn't the dot com bubble a far better proxy? Notably, todays spending is both higher and more concentrated in a few companies that a large part of the population has exposure to (most dot com companies weren't publicly traded and far smaller vs MSFT, Alphabet, Meta, Oracle, NVDA making up most investment today) by way of pension funds, ETFs, etc.?

the_duke · 2025-10-19T13:27:50 1760880470

Sure, but all of the above have solid businesses that rake in lots of money, revenue based on AI is a small percentage for them.

An AI bust would take the stock price down a good deal, but the stock gains have been relatively moderate. Year on year: Microsoft +14%, Meta +24%, Google +40, Oracle +60%, ... And a notable chunk of those gains have indirectly come from the dollar devaluing.

Nvidia would be hit much harder of course.

There is a good amount of smaller AI startups, but a lot of the AI development is concentrated on the big dogs, it's not nearly as systemic as in dot com, where a lot of businesses went under completely.

And even with an AI freeze, there is plenty of value and usage there already that will not go away, but will keep expanding (AI chat, AI coding, etc) which will mitigate things.

rhetocj23 · 2025-10-19T21:54:21 1760910861

The problem is do you know much 14% etc is? We are talking about valuations in the trillions my friend!

Topfi · 2025-10-18T11:46:57 1760788017

Accutrons and tuning fork watches are amazing. They have an incredibly unique sound/hum due to the tuning fork oscillating at 360 hz and the most smooth glide you'll ever see in a watch. Recommend a ESA 9162 or ESA 9164 over a pure Accutron for beginners though, a bit more resilient and far more affordable, though they don't have the exposed dial.

briansm · 2025-10-18T13:09:13 1760792953

I believe this is why all modern digital watches use a 32768.0Hz crystal resonator, it's a power-of-2 frequency above the 20Khz top end of the range of human audio perception, to avoid the whole 'tinnitus on your wrist' thing.

namibj · 2025-10-18T17:57:30 1760810250

Also a tuning fork cut for a lower power-of-two would be a bit bulky for a compact wrist watch.

aidenn0 · 2025-10-18T16:42:17 1760805737

I have an Accutron 214 and I swear it sounds higher pitched than 360Hz (sounds to my ear higher than A440, which I'm very familiar with). Maybe I'm hearing an overtone?

goopypoop · 2025-10-18T20:09:00 1760818140

you could measure it.

using an app with a Fast Fourier Transform (e.g. https://github.com/woheller69/audio-analyzer-for-android ), you can visually compare the sounds of your watches

aidenn0 · 2025-10-18T23:36:05 1760830565

I know it is at 360Hz, since it keeps time well.

goopypoop · 2025-10-19T20:21:20 1760905280

what do you want?

Topfi · 2025-10-17T10:48:29 1760698109

It goes back to Sparks of AGI [0] unless I am mistaken. Can recommend the talk, one that has stayed in the back of my mind since I first saw it two years ago. Personally, still have major reservations about throwing claims of intelligence or understanding around, but I do agree that SVG code generation can be a very effective source to get a quick and easy to present understanding of a models ability to output code with a rather open ended prompt that needs a high degree of coherence and were a lot of layers depend/build on each other.

Helps that these are eye catching (literally as the output is visual) and easy to grasp. Same reason a lot of hype is created around the web desktops.

[0] https://youtu.be/qbIk7-JPB2c?si=_TNRrxN-_5FOlfy5&t=1342

Topfi · 2025-10-16T17:25:33 1760635533

Has been ongoing for roughly a month now, with a variety of checkpoints along the usual speculation. As it stands, I'd just wait for the official announcement, prior to making any judgement. What their release plans are, whether a checkpoint is a possible replacement for Pro, Flash, Flash Lite, a new category of model, won't be released at all, etc. we cannot know.

More importantly, because of the way AIStudio does A/B testing, the only output we can get is for a single prompt and I personally maintain that outside of getting some basic understanding on speed, latency and prompt adherence, output from one single prompt is not a good measure for performance in the day-to-day. It also, naturally, cannot tell us a thing about handling multi file ingest and tool calls, but hype will be hype.

That there are people who are ranking alleged performance solely by one-prompt A/B testing output says a lot about how unprofessionally some evaluate model performance.

Not saying the Gemini 3.0 models couldn't be competitive, I just want to caution against getting caught up in over-excitement and possible disappointment. Same reason I dislike speculative content in general, it rarely is put into the proper context cause that isn't as eyecatching.

tuesdaynight · 2025-10-16T21:02:29 1760648549

I understand that hyping is the career of a lot of people, but it's a little annoying how every Twitter link posted here is full of "IT'S A GAME CHANGER!!! NOTHING IS THE SAME ANYMORE!!! BRACE FOR IMPACT!!!" energy. The examples look great, but it's hard to ignore the unprofessional evaluation that you described.

cactusplant7374 · 2025-10-17T04:02:21 1760673741

The example in this case is an SVG of a video game controller.