More

MyFirstSass · 2026-01-12T10:24:26 1768213466

That's not true at all.

The ones who ace their careers are for the most people that are fun, driven, or psychos, all social traits that make you good in a political game.

Spending lots of time with other socially awkward types talking about hard math problems or whatever will get you nowhere outside of some SF fantasy startup movie.

I'd say it's especially important for the more nerdy (myself included) to be more outgoing, and do other stuff like sales or presentations, design/marketing og workshops - that will make you exceptional because you then got the "whole package" and undestand the process and other people.

MyFirstSass · 2026-01-11T21:30:07 1768167007

Does anyone know if Stephen Lemay replacing Dye will potentially "save" the increasing mess that is OSX, at least UX wise, or is it more of a meaningless figurehead swap in a big org?

Tahoe is tragically bad by almost every UX measure, and following various Apple subreddits i wonder if they just don't care anymore - since the majority of people are shocked by the amateurishness of both bugs and design choices in the latest update - this comes on top of literally every major bug being ignored from the alpha to releasing anyway then continuing to ignore feedback.

diskzero · 2026-01-11T23:04:10 1768172650

I worked on Finder/TimeMachine/Spotlight/iOS at Apple from 2000-2007. I worked closely with Bas Ording, Stephen Lemay, Marcel van Os, Imran Chaudry, Don Lindsey and Greg Christie. I have no experience with any of the designers who arrived in the post-Steve era. During my time, Jony Ive didn't figure prominently in the UI design, although echoes of his industrial design appeared in various ways in the graphic design of the widgets. Kevin Tiene and Scott Forstall had more influence for better or worse, extreme skeumorphism for example.

The UX group would present work to Steve J. every Thursday and Steve quickly passed judgement often harshly and without a lot of feedback, leading to even longer meetings afterward to try and determine course corrections. Steve J. and Bas were on the same wavelength and a lot of what Bas would show had been worked on directly with Steve before hand. Other things would be presented for the first time, and Steve could be pretty harsh. Don, Greg, Scott, Kevin would push back and get abused, but they took the abuse and could make in-roads.

Here is my snapshot of Stephen from the time. He presented the UI ideas for the intial tabbed window interface in Safari. He had multiple design ideas and Steve dismissed them quickly and harshly. Me recollection was that Steve said something like No, next, worse, next, even worse, next, no. Why don't you come back next week with something better. Stephen didn't push back, say much, just went ok and that was that. I think Greg was the team manager at the time and pushed Steve for more input and maybe got some. This was my general observation of how Stephen was over 20 years ago.

I am skeptical and doubtful about Stephen's ability to make a change unless he is facilitated greatly by someone else or has somehow changed drastically. The fact that he has been on the team while the general opinion of Apple UX quality has degraded to the current point of the Tahoe disaster is telling. Several team members paid dearly in emotional abuse under Steve and decided to leave rather than deal with the environment post Steve's death. Stephen is a SJ-era original and should have been able to push hard against what many of us perceive as very poor decisons. He either agreed with those decisions, or did not, and choose to go with the flow and enjoy the benefits of working at Apple. This is fine I guess. Many people are just fine going with the flow and not rocking the boat. It may be even easier when you have Apple-level comp and benefits.

My opinon; unless Stephen gets a very strong push from other forces, I don't see that he has the will or fortitude to make the changes that he himself has approved in one way or another. Who will push him? Tim Cook, Craig Federighi, Eddy Cue, Phil Schiller? The perceived mess of Tahoe happened on the watch of all of these Apple leaders.

Y-bar · 2026-01-12T08:42:02 1768207322

Thanks for this interesting read.

I’m asking you to judge people’s state of mind here, which is near impossible, but please bear with me…

> Several team members paid dearly in emotional abuse under Steve and decided to leave rather than deal with the environment post Steve's death.

Normally during an event like this there is a change in culture as well which I think we have seen under Cook. So why did they assume that the abusive situation would continue? Jobs was generally known to be harsh to the point of abusive, but if the situation did not change on his death maybe the abuse was equal parts cultural rather than just from the CEO, so why not leave earlier?

diskzero · 2026-01-12T15:46:02 1768232762

This question is forcing me to do some deep thinking about my time there, which I haven't done is quite a awhile.

Some people left early, like Don Lindsay. Don was instrumental in bringing Aqua to life, along with Bas of course, and led the team up and through the release of Cheetah and more. This task wasn't easy at all. To me it seems like he was finally going to receive some reward of those hard years of work. But instead he chose to leave to go to Microsoft. This boggled my mind, as leaving to Microsoft to me seemed incomprehensible. Maybe Don had enough of the abuse? Maybe he was sick of the increasingly crowded commute? The daily visits from Steve pointing out every detail of the UI that bothered him? Did you know the UX designed many of the big banners and posters for the WWDC events. Steve didn't want any old graphic designer to do those, so Bas, Imran and others would work on them. Don had to deal with that too.

When Steve left to receive cancer treatment in 2004, he still had influence, Bertrand Serlet was running engineering, Jony Ive was focussed on industrial design. We were working on Tiger with the brushed metal interface and there was a lot of activity on that. Tim Cook was running the business, but Bas and others were keeping the ball rolling on the UX with remote input from Steve.

I wasn't around for the next two leaves of absence, the last one being final, but heard that things were becoming increasingly fractious with camps emerging around Tony Fadell, Scott Forstall, Jony Ive and general politcal unpleasantness as Tim Cook was given various ultimatums about "I won't work with this or that person." Everyone was trying to say that they represented the vision of Steve and somehow knew what would Steve do given any sitution. Geez, if we knew what Steve would do or wanted, there could have been a lot of really distressing confrontations avoided over the previous years.

This type of internal sniping didn't happen with Steve around, or if it did, it wasn't very effective. I think it would have gotten you fired. Tony Fadell pushed it to the limit with Steve and Scott. I remember someone once asking Steve about getting free lunch at Apple, like you could get at Google and they were told "If all you want is free lunch, then you should be working at Google."

For me, there was a certain amount of clarity that came from Steve's abusive behavior. It could wear you down on one level, but also brought focus and drive to getting things done. I think it was very unhealthy one one level and very exciting on another. There weren't endless meeting on calendars discussing minutia. It also meant that the obvious horrors of the Tahoe wouldn't happen. Steve himself would have grabbed the windows with different corner radii, stacked them up and excoriated whoever was responsible. Some of my work was called "real bottom of the barrel shit", "the worst he has ever seen" and told "this is not the way we do things at Apple." I assure you, what he was complaining about was nothing remotely close to what we are seeing in Tahoe.

hyperjeff · 2026-01-12T20:35:07 1768250107

I'd love to hear more! Have you collected stories on a blog or to places like Folklore.org?

diskzero · 2026-01-12T22:26:42 1768256802

The extent of my writings are here in HackerNews comments. I don't have the time or discipline of Andy to be able to sit down and write like he does. Maybe someday, but for now I am using the free time I have outside of work to make music and ride bikes as fast as I can.

Y-bar · 2026-01-13T08:40:43 1768293643

Maybe he would consider to collect, proof-read, and edit if you pointed him to your comments here?

manxiemanx · 2026-01-13T23:50:14 1768348214

The mess of Tahoe didn't just happen on the watch of Tim Cook, Craig Federighi, Eddy Cue, Phil Schiller, it happened because of them.

Tim Cook has no taste and no sense of quality. He merely counts beans really well. Craig Federighi is responsible for the most precipitous drop in Apple's software quality since the late 80s and early 90s. Eddy Cue is responsible for some of Apple's worst software (music, iCloud, services), and Phil Schiller… what exactly does he do again?

discordance · 2026-01-12T01:51:12 1768182672

Thanks for the first hand insights. Do you know if much has changed in the past 18 years since your tenure there?

diskzero · 2026-01-12T02:02:08 1768183328

I still have friends who work there. Some of them came to Apple from Be or Eazel, and are still working on Finder, Safari, Dock, etc. A lot has changed and in my opinion not for the best. Compared to them, my time there was a flash in the pan. When I look at Safari, Finder and the general state of the UI, I am deeply saddened. I see a bizarre combination of stagnancy, gratuitious change and general aimlessness across the desktop and mobile. I also have a deep distrust of anyone who works at big company, let alone a big company on one component for a long amount of time. To me, it leads to a focus away from external customers and to becoming an expert at internal politics. I probably need counseling, but I loved the dictatorship of the Steve era. Yes, we can point to flaws like the Mac Cube or the hockey puck mouse, but I really appreciated someone just maniacally fixated on getting things done and cutting through the BS that I saw later on in jobs in big tech.

It would be nice if veterans of the post-Steve era would post on here. Maybe they are scared, bound by NDAs or could care less. Like I said, I need some mental health treatment about my time(s) at Apple I was there working on Final Cut Pro after Be, went to Eazel, and then rejoined Apple as part of Steve's mass hiring of Eazel employees at the behest of Andy Hertzfeld.

OGEnthusiast · 2026-01-11T21:36:05 1768167365

He will prevent it from getting much worse than it would have under another decade of Dye, but I don't think he can totally reverse the trend.

I think this is just what happens to companies as they get older. Most of the people who pioneered the Human Interface Guidelines aren't at the company anymore, and management doesn't see much financial growth in Mac sales compared to AI and services.

MonkeyClub · 2026-01-11T23:06:45 1768172805

> compared to AI and services

It's probably the services (Care, iCloud, Music, and even TV), Apple's AI isn't on the overall map at all compared to the competition.

coffeeling · 2026-01-12T04:38:06 1768192686

A lot of Apple's services revenue is Apple Store mobile games, AIUI.

mvkel · 2026-01-11T23:37:36 1768174656

Lemay's appointment was widely celebrated, but he'd been at apple since 1999 and never got the gig. My guess is that there are valid reasons for that that may not be design-related.

MyFirstSass · 2026-01-10T00:21:51 1768004511

Based on Tao’s description of how the proof came about - a human is taking results backwards and forwards between two separate AI tools and using an AI tool to fill in gaps the human found?

I don’t think it can really be said to have occurred autonomously then?

Looks more like a 50/50 partnership with a super expert human one the one side which makes this way more vague in my opinion - and in line with my own AI tests, ie. they are pretty stupid even OPUS 4.5 or whatever unless you're already an expert and is doing boilerplate.

EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

D-Machine · 2026-01-10T00:29:21 1768004961

You're understanding correctly, this is back and forth between Aristotle and ChatGPT and a (very smart) user.

MyFirstSass · 2026-01-10T00:36:55 1768005415

I'm not sure i understand the wild hype here in this thread then.

Seems exactly like the tests at my company where even frontier models are revealed to be very expensive rubber ducks, but completely fails with non experts or anything novel or math heavy.

Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

markusde · 2026-01-10T00:42:38 1768005758

Yes, the contributions of the people promoting the AI should be considered, as well as the people who designed the Lean libraries used in-the-loop while the AI was writing the solution. Any talk of "AGI" is, as always, ridiculous.

But speaking as a specialist in theorem proving, this result is pretty impressive! It would have likely taken me a lot longer to formalize this result even if it was in my area of specialty.

falcor84 · 2026-01-10T01:09:26 1768007366

> Any talk of "AGI" is, as always, ridiculous.

How did you arrive at "ridiculous"? What we're seeing here is incredible progress over what we had a year ago. Even ARC-AGI-2 is now at over 50%. Given that this sort of process is also being applied to AI development itself, it's really not clear to me that humans would be a valuable component in knowledge work for much longer.

DiscourseFan · 2026-01-10T01:44:56 1768009496

It requires constant feedback, critical evaluation, and checks. This is not AGI, its cognitive augmentation. One that is collective, one that will accelerate human abilities far beyond what the academic establishment is currently capable of, but that is still fundamentally organic. I don't see a problem with this--AGI advocates treat machine intelligence like some sort of God that will smite non-believers and reward the faithful. This is what we tell children so that they won't shit their beds at night, otherwise they get a spanking. The real world is not composed of rewards and punishments.

komali2 · 2026-01-10T08:05:08 1768032308

It does seem that the venn diagram of "roko's basilisk" believers and "AGI is coming within our lifetimes" believers is nearly a circle. Would be nice if there were some less... religious... arguments for AGI's imminence.

DiscourseFan · 2026-01-10T08:36:52 1768034212

I think the “Roko’s Basilisk” thing is mostly a way for readers of Nick Land to explain part of his philosophical perspective without the need for, say, an actual background in philosphy. But the simplicity reduces his nuanced thought into a call for a sheeplike herd—they don’t even need a shepherd! Or perhaps there is, but he is always yet to come…best to stay in line anyway, he might be just around the corner.

falcor84 · 2026-01-10T11:01:57 1768042917

> It requires constant feedback, critical evaluation, and checks. This is not AGI, its cognitive augmentation.

To me that doesn't sound qualitatively different from a PhD student. Are they just cognitive augmentation for their mentor?

In any case, I wasn't trying to argue that this system as-is is AGI, but just that it's no longer "ridiculous", and that this to me looks like a herald of AGI, as the portion being done by humans gets smaller and smaller

DiscourseFan · 2026-01-10T16:11:00 1768061460

People would say the same thing about a calculator, or computation in general. Just like any machine it must be constructed purposefully to be useful, and once we require something which exceeds that purpose it must be constructed once again. Only time will tell the limits of human intelligence, now that AI is integrating into society and industry.

frozenseven · 2026-01-10T12:51:48 1768049508

>AGI advocates treat machine intelligence like some sort of God that will smite non-believers and reward the faithful.

>The real world is not composed of rewards and punishments.

Most "AGI advocates" say that AGI is coming, sooner rather than later, and it will fundamentally reshape our world. On its own that's purely descriptive. In my experience, most of the alleged "smiting" comes from the skeptics simply being wrong about this. Rarely there's talk of explicit rewards and punishments.

DiscourseFan · 2026-01-10T15:52:14 1768060334

You should look into “Roko’s Basilisic,” its a genuine belief that often goes alongside that of AGI.

frozenseven · 2026-01-11T01:20:37 1768094437

I should be the target audience for this stuff, but I honestly can't name a single person who believes in this "Roko's basilisk" thing. To my knowledge, even the original author abandoned it. There probably are a small handful out there, but I've never seen 'em myself.

markusde · 2026-01-10T04:07:53 1768018073

> it's really not clear to me that humans would be a valuable component in knowledge work for much longer.

To me, this sounds like when we first went to the moon, and people were sure we'd be on Mars be the end of the 80's.

> Even ARC-AGI-2 is now at over 50%.

Any measure of "are we close to AGI" is as scientifically meaningful as "are we close to a warp drive" because all anyone has to go on at this point is pure speculation. In my opinion, we should all strive to be better scientists and think more carefully about what an observation is supposed to mean before we tout it as evidence. Despite the name, there is no evidence that ARC-AGI tests for AGI.

ogogmad · 2026-01-10T13:17:48 1768051068

> To me, this sounds like when we first went to the moon, and people were sure we'd be on Mars be the end of the 80's.

Unlike space colonisation, there are immediate economic rewards from producing even modest improvements in AI models. As such, we should expect much faster progress in AI than space colonisation.

But it could still turn out the same way, for all we know. I just think that's unlikely.

zeroonetwothree · 2026-01-10T15:46:01 1768059961

The minerals in the asteroid belt are estimated to be worth in the $100s of quintillions. I would say that’s a decent economic incentive to develop space exploration (not necessarily colonization, but it may make it easier).

jacquesm · 2026-01-10T04:10:45 1768018245

You either have a case of human augmented AI here or AI augmented human. Either by themself would not have made the step.

emil-lp · 2026-01-10T06:58:32 1768028312

If I were to place my money, it would be ok Terence Tao.

feastingonslop · 2026-01-10T01:16:00 1768007760

Excellent! Humans can then spend their time on other activities, rather than get bogged down in the mundane.

navels · 2026-01-10T01:25:45 1768008345

Other activites such as the sublime pursuit of truth and beauty . . . aka mathematics ;-)

latexr · 2026-01-10T11:00:01 1768042801

Not going to happen as long as the society we live in has this big of a hard on for capitalism and working yourself to the bone is seen as a virtue. Every time there’s a productivity boost, the newly gained free time is immediately consumed by more work. It’s a sick version of Parkinson’s law where work is infinite.

https://en.wikipedia.org/wiki/Parkinson%27s_law

catlifeonmars · 2026-01-10T01:42:54 1768009374

“Much longer” is doing a lot of heavy lifting there.

falcor84 · 2026-01-10T10:51:14 1768042274

Let me put it like this: I expect AI to replace much of human wage labor over the next 20 years and push many of us, and myself almost certainly included, into premature retirement. I'm personally concerned that in a few years, I'll find my software proficiency to be as useful as my chess proficiency today is useful to Stockfish. I am afraid of a massive social upheaval both for myself and my family, and for society at large.

chongli · 2026-01-10T12:07:34 1768046854

Here “much of” is doing the heavy lifting. Are you willing to commit to a percentage or a range?

I work at an insurance company and I can’t see AI replacing even 10% of the employees here. Too much of what we do is locked up in decades-old proprietary databases that cannot be replaced for legal reasons. We still rely on paper mail for a huge amount of communication with policyholders. The decisions we make on a daily basis can’t be trusted to AI for legal reasons. If AI caused even a 1% increase in false rejections of claims it would be an enormous liability issue.

falcor84 · 2026-01-10T13:24:46 1768051486

Yes, absolutely willing to commit. I can't find a single reliable source, but from what I gather, over 70% of people in the West do "pure knowledge work", which doesn't include any embodied actuvities. I am happy to put my money that these jobs will start being fully taken over by AI rapidly soon (if they aren't already), and that by 2035, less than 50% of us will have a job that doesn't require "being there".

And regarding your example of an insurance company, I'm not sure about that industry, but seeing the transformation of banking over the last decade to fully digital providers like Revolut, I would expect similar disruption there.

zeroonetwothree · 2026-01-10T15:48:06 1768060086

I would easily take the other side of this bet. It just reminds me when everyone was sure back in 2010 that we’d have self driving cars within 10 years and human drivers would be obsolete. Today replacing human drivers fully is still about 10 years away.

falcor84 · 2026-01-12T03:06:05 1768187165

Yes, getting the timelines right is near impossible, but the trajectory is clear to me, both on AI taking over pure knowledge work and on self-driving cars replacing human drivers. For the latter, there's a lot of inertia and legalities to overcome, and scaling physical things is hard in general, but Waymo alone crossed 450,000 weekly paid rides last month [0], and now that it's self-driving on highways too, and is slated to launch in London and Tokyo this year, it seems to me that there's no serious remaining technical barrier to it replacing human drivers.

As for a bet, yes, I'd really be happy to put my money where my mouth is, if you're familiar with any long bets platform that accepts pseudonymous users.

[0] https://www.cnbc.com/2025/12/08/waymo-paid-rides-robotaxi-te...

dehsge · 2026-01-10T18:42:26 1768070546

There are other bounds here at play that are often not talked about.

Ai runs on computers. Consider the undecidability of Rices theorem. Where compiled code of non trivial statements may or may not be error free. Even an ai can’t guarantee its compiled code is error free. Not because it wouldn’t write sufficient code that solves a problem, but the code it writes is bounded by other externalities. Undecidability in general makes the dream of generative ai considerably more challenging than how it’s being ‘sold.

catlifeonmars · 2026-01-11T17:55:40 1768154140

> massive social upheaval

You don’t even need AGI for that though, just unbounded investor enthusiasm and a regulatory environment that favors AI providers at the expense of everyone else.

My point is there are number of things that can cause large scale unemployment in the next 20 years and it doesn’t make sense to worry about AGI specifically while ignoring all of the other equally likely root causes (like a western descent into oligarchy and crony capitalism, just to name one).

markusde · 2026-01-10T04:08:46 1768018126

As is "even if it was in my area of specialty". I would not be able to do this proof, I can tell you that much.

jacquesm · 2026-01-10T04:08:46 1768018126

This accurately mirrors my experience. It never - so far - has happened that the AI brought any novel insight at the level that I would see as an original idea. Presumably the case of TFA is different but the normal interaction is that that the solution to whatever you are trying to solve is a millimeter away from your understanding and the AI won't bridge that gap until you do it yourself and then it will usually prove to you that was obvious. If it was so obvious then it probably should have made the suggestion...

Recent case:

I have a bar with a number of weights supported on either end:

|---+-+-//-+-+---|

What order and/or arrangement or of removing the weights would cause the least shift in center-of-mass? There is a non-obvious trick that you can pull here to reduce the shift considerably and I was curious if the AI would spot it or not but even after lots of prompting it just circled around the obvious solutions rather than to make a leap outside of that box and come up with a solution that is better in every case.

I wonder what the cause of that kind of blindness is.

ogogmad · 2026-01-10T13:37:56 1768052276

The problem is unclear. I think you have a labelled graph G=(V, E) with labels c:V->R, such that each node in V consists of a triple (L, R, S) where L is a sequence of weights are on the left, R is a sequence of weights that are on the right, and S is a set of weight that have been taken off. Define c(L, R, S) to be the centre of mass. Introduce an undirected edge e={(L, R, S), (L', R', S')} between (L, R, S) and (L', R', S') either if (i) (L', R', S') results from taking the first weight off L and adding it to S, or (ii) (L', R', S') results from taking the first weight off R and adding it to S, or (iii) (L', R', S') results from taking a weight from W and adding it to L, or (iv) (L', R', S') results from taking a weight from W and adding it to R.

There is a starting node (L_0, R_0, {}) and an ending node ({}, {}, W) , with the latter having L=R={}.

I think you're trying to find the path (L_n, R_n, S_n) from the starting node to the ending node that minimises the maximum absolute value of c(L_n, R_n, S_n).

I won't post a solution, as requested.

jacquesm · 2026-01-10T14:07:20 1768054040

You are overthinking it.

jiggawatts · 2026-01-11T05:24:37 1768109077

You are underspecifying it.

jiggawatts · 2026-01-10T08:43:21 1768034601

That problem is not clearly stated, so if you’re pasting that into an AI verbatim you won’t get the answer you’re looking for.

My guess is: first move the weights to the middle, and only then remove them.

However “weights” and “bar” might confuse both machines and people into thinking that this is related to weight lifting, where there’s two stops on the bar preventing the weights from being moved to the middle.

jacquesm · 2026-01-10T12:02:05 1768046525

The problem is stated clearly enough that humans that we ask the question of will sooner or later see that there is an optimum and that that optimum relies on understanding.

And no, the problem is not 'not clearly stated'. It is complete as it is and you are wrong about your guess.

And if machines and people think this is related to weight lifting then they're free to ask follow up questions. But even in the weight lifting case the answer is the same.

red75prime · 2026-01-10T12:23:04 1768047784

Illusion of transparency. You are imagining yourself asking this question, while standing in the gym and looking at the bar (or something like this). I, for example, have no idea how the weights are attached and which removal actions are allowed.

Yeah, LLMs have a tendency to run with some interpretation of a question without asking follow-up questions. Probably, it's a consequence of RLHFing them in that way.

jacquesm · 2026-01-10T13:03:11 1768050191

And none of those details matter to solve the problem correctly. I'm purposefully not putting any answers here because I want to see if future generations of these tools suddenly see the non-obvious solution. But you are right about the fact that the details matter, one detail is mentioned very explicitly that holds the key.

If you do solve it don't post the answer.

Mawr · 2026-01-10T15:07:09 1768057629

Sure they, do, the problem makes no sense as stated. The solution to the stated problem is to remove all weights all at once, solved. Or even two at a time, opposite the centre of gravity. Solved, but not what you're asking I assume?

You didn't even label your ASCII art, so I've no clue what you mean, are the bars at the end the supports or weights? Can I only remove one weight at a time? Initially I assumed you mean a weightlifting bar the weights on which can only be removed from its ends. Is that the case or what? What's the double slash in the middle?

Also: "what order and/or arrangement or of removing the weights" this isn't even correct English. Arrangement of removing the weights? State the problem clearly, from first principles, like you were talking to a 5 year old.

The sibling comment is correct, you're clearly picturing something in your mind that you're failing to properly describe. It seems obvious to you, but it's not.

jacquesm · 2026-01-10T16:07:45 1768061265

And yet, two people have solved it independently, so apparently it is adequately specified for some.

jiggawatts · 2026-01-10T22:37:54 1768084674

“Luck is not a strategy.”

I can successfully interpret total gibberish sometimes, but that’s not a robust approach even with humans let alone machines.

People have wildly different experiences utilising AI because of their own idiosyncrasies more than issues with the tools themselves.

It was pointed out by multiple groups (such as Anthropic) that their tools do a lot better with well organised codebases that are liberally commented.

I’ve worked on codebases where the AIs are just… lost. So are people!

Sure, some people can navigate the spaghetti… sometimes… but the success rate of changes is much lower.

Occasional success is not proof of correctness of approach. Consistent success is.

TeodorDyakov · 2026-01-10T10:34:26 1768041266

Tokenizationnnnnnn

krzat · 2026-01-10T07:26:18 1768029978

In other words, LLMs work best when *you are absolutely right" and "this is a very insightful question" are actually true.

SecretDreams · 2026-01-10T01:52:36 1768009956

> Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

This hits so true to home. Just today in my field a manager without expertise in a topic gave me an AI solution to something I am an expertise in. The AI was very plainly and painfully wrong, but it comes down to the user prompting really poorly. When I gave a el formulated prompt to the same topic, I got the correct answer on the first go.

encyclopedism · 2026-01-10T14:16:57 1768054617

Lots of users seem to think LLM's think and reason so this sounds wonderful. A mechanical process isn't thinking, certainly it does NOT mirror human thinking. The processes being altogether different.

EA-3167 · 2026-01-10T19:39:51 1768073991

Do you have any idea how many people here have paychecks that depend on the hype, or hope to be in that position? They were the same way for Crypto until it stopped being part of the get-rich-quick dream.

HDThoreaun · 2026-01-10T01:47:22 1768009642

"the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument." From the Tao thread. The ability to quickly iterate on research is a big change because "This is sharp contrast to existing practice where....large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors."

Davidzheng · 2026-01-10T00:39:36 1768005576

The proof is ai generated?

MyFirstSass · 2026-01-10T00:43:58 1768005838

Eh? The text reads:

"Aristotle integrates three main components: a Lean proof search system, an informal reasoning system that generates and formalizes lemmas, and a dedicated geometry solver"

Not saying it's not an amazing setup, i just don't understand the word "AI" being used like this when it's the setup / system that's brilliant in conjunction with absolute experts.

kortex · 2026-01-10T00:54:19 1768006459

That's literally AI though. AI has been around formally since 1956.

https://en.wikipedia.org/wiki/Dartmouth_workshop

AI != AGI != neural networks != LLMs

But Tao did mention ChatGPT so i believe LLMs were involved at least partially.

adityaathalye · 2026-01-10T07:54:46 1768031686

Exactly "The Geordi LaForge Paradox" of "AI" systems. The most sophisticated work requires the most sophisticated user, who can only become sophisticated the usual way --- long hard work, trial and error, full-contact kumite with reality, and a degree of devotion to the field.

NooneAtAll3 · 2026-01-10T03:47:55 1768016875

https://www.erdosproblems.com/forum/thread/728#post-2808

> There seems to be some confusion on this so let me clear this up. No, after the model gave its original response, I then proceeded to ask it if it could solve the problem with C=k/logN arbitrarily large. It then identified for itself what both I and Tao noticed about it throwing away k!, and subsequently repaired its proof. I did not need to provide that observation.

so it was literally "yo, your proof is weak!" - "naah, watch this! [proceeds to give full proof all on its own]"

I'd say that counts

jasonfarnon · 2026-01-10T00:42:20 1768005740

I had the impression Tao/community weren't even finding the gaps, since they mentioned using an automatic proof verifier. And that the main back and forth involved re-reading Erdos' paper to find out the right problem Erdos intended. So more like 90/10 LLM/human. Maybe I misread it.

NewsaHackO · 2026-01-10T02:36:08 1768012568

This is what I got from Tao's post as well.

dpacmittal · 2026-01-10T05:43:43 1768023823

There's a lot more detail in this reddit post from the author - https://www.reddit.com/r/OpenAI/comments/1q6yw5g/how_we_used...

Tenobrus · 2026-01-10T02:46:48 1768013208

strongly think you should go read the thread to get a sense of the level of expertise and amount of effort put in by the humans involved: https://www.erdosproblems.com/forum/thread/728#post-2852

naasking · 2026-01-10T14:38:43 1768055923

> EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

"solved more or less autonomously by AI" were Tao's exact words, so I think we can trust his judgment about how much work he or the AI did, and how this indicates a meaningful increase in capabilities.

mmphosis · 2026-01-10T02:42:28 1768012948

This website was made by Thomas Bloom, a mathematician who likes to think about the problems Erdős posed. Technical assistance with setting up the code for the website was provided by ChatGPT -from the FAQ

Davidzheng · 2026-01-10T00:37:57 1768005477

Do you need to be a super expert to find gaps in proofs? Debatable

Yeask · 2026-01-10T00:32:53 1768005173

Is a good economic decision to hype a bit the importance of the LLM$.

MyFirstSass · 2026-01-07T09:29:02 1767778142

"Built out products" like you're earning money on this? Having actual users, working through edge cases, browser quirks, race conditions, marketing, communication - the real battle testing 5% that's actually 95% of the work that in my view is impossible for the LLM? Because yeah the easy part is to create a big boilerplate app and have it sit somewhere with 2 users.

The hard part is day to day operations for years with thousands of edge cases, actual human feedback and errors, knocking on 1000 doors etc.

Otherwise you're just doing slot machine coding on crack, where you work and work and work one some amazing thing then it goes nowhere - and now you haven't even learned anything because you didn't code so the sideproject isn't even education anymore.

What's the point of such a project?

ChadNauseam · 2026-01-07T18:00:43 1767808843

> "Built out products" like you're earning money on this?

No, I'm not interested in monetizing stuff, I make enough money from $dayjob.

> Having actual users, working through edge cases, browser quirks, race conditions, marketing, communication - the real battle testing 5% that's actually 95% of the work that in my view is impossible for the LLM?

Yes, all of those. Obviously an LLM won't make a tiktok ad for me, but it can help with all the other stuff. For example, you mentioned browser quirks. I ran into a bug in safari's OPFS implementation that an LLM was able to help me track down and work around. I also ran into the chrome issue where backdrop effects don't work if any of the element's parents have nonzero transparency, and claude helped me find all the cases where that happened and fix them. Both of these are from working on the app in my bio. It's a language app too, so however many edge cases you think there are, there's more :D

I don't want to give the impression that it was not a lot of work. It was an enormous amount of work. It's just that each step is significantly faster now.

> and now you haven't even learned anything because you didn't code so the sideproject isn't even education anymore.

I read every line. You could pull up the github right now and point to any line of code and I could tell you what it does and why it's there and what will break if you remove or change it.

> What's the point of such a project?

I originally made it because I wanted a tool to help me learn French. It has succeeded in helping my enormously, to the point where I can have short conversations with my french family members now. Others seem to find it useful too.

MyFirstSass · 2025-12-18T11:42:08 1766058128

How's this different than the good old Doodle everyone uses in europe at least?

https://doodle.com/en/

MyFirstSass · 2025-12-15T01:38:29 1765762709

Hackernews seems completely astroturfed in the last weeks.

It's not the hackernews i knew even 3 years ago anymore and i'm seriously close to just ditching the site after 15+ years of use.

I use AI heavily but everyday there's crazy optimistic almost manic posts about how AI is going to take over various sectors that are completely ludicrous - and they are all filled with comments from bizarrely optimistic people that have seemingly no knowledge of how software is actually run or built, ie. it's the human organisational, research and management elements that are the hard parts, something AI can't do in any shape or form at the moment for any complex or even small company.

MyFirstSass · 2025-12-02T15:24:38 1764689078

I highly doubt "pumping out bespoke apps all day" is possible yet besides 100% boilerplate, and when possible then no good for any other purpose than enshittifiying the web, and at that point not profitable because everyone can do it.

I use AI daily as a senior coder for search and docs, and when used for prototyping you still need to be a senior coder to go from say 60% boilerplate to 100% finished app/site/whatever unless it's incredibly simple.

alwillis · 2025-12-02T19:56:28 1764705388

> I use AI daily as a senior coder for search and docs, and when used for prototyping you still need to be a senior coder to go from say 60% boilerplate to 100% finished app/site/whatever unless it's incredibly simple.

I know you would like to believe that, but with the tools available NOW, that's not necessarily the case. For example, by using the Playwright or Chrome DevTools MCPs, models can see the web app are it's being created and it's pretty easy to prompt them to fix something they can see.

These models know the current frameworks and coding practices but they do need some guidance; they're not mindreaders.

MyFirstSass · 2025-12-02T21:55:10 1764712510

I still don't believe that. Again yes a boilerplate calculator or recipe app probably, but anything advanced real world with latency issues, scaling, race conditions, css quirks, design weirdness, optimisation - in other words the things that actually require domain knowledge i still don't get much help with, even with Claude Code, pointers yes but they completely fumble actual production code in real world scenarios.

Again it's the last 5% that takes 95% of the time, and those 5% i haven't seen fixed with Claude or Gemini, because it's essentially quirks, browser errors, race conditions, visual alignment, etc etc. All stuff that completely goes way above any LLM's head atm from what i've seen.

They can definitely bullshit a 95% working app though, but that's 95% from being done ;)

Workaccount2 · 2025-12-02T16:09:07 1764691747

Often the problem with tech people is they think software only exists for tech or for being sold to others from tech.

Nothing I do is in the tech industry. It's all manufacturing and all the software is for in-house processes.

Believe it or not, software is useful to everyone and no longer needs to originate from someone who only knows software.

MyFirstSass · 2025-12-02T18:02:55 1764698575

I'm saying you can't do what you're saying without knowing code at the moment.

You didn't give any examples of the valuable bespoke apps that you are creating by the hour.

I simply don't believe you, and the arrogant salesy tone doesn't help.

Workaccount2 · 2025-12-02T21:56:18 1764712578

LLMs can pretty reliably write 5-7k LOC.

If your needs fit in a program that size, you are pretty much good to go.

It will not rewrite PCB_CAD 2025, but it will happily create a PCB hole alignment and conversion app, eliminated the need for the full PCB_CAD software if all you need is that one toolset from it.

Very, very, few pieces of software need to be full package enterprise productivity suites. If you just make photos black and white and resize them, you don't need Photoshop to do it. Or even ms paint. Any LLM will make a simple free program with no ads to do it. Average people generally do very simple dumb stuff with the expensive software they buy.

vjvjvjvjghv · 2025-12-02T16:45:59 1764693959

This is the same as the discussion about using Excel. Excel has its limitations, but it has enabled millions of people to do pretty sophisticated stuff without the help of “professionals”. Most of the stuff us tech people do is also basically some repetitive boilerplate. We just like to make things more complex than they need to be. I am always a little baffled why seemingly every little CRUD site that has at most 100 users needs to be run on Kubernetes with several microservices, CI/CD pipelines, and whatever.

As far as enshittification goes, this was happening long before AI. It probably started with SEO and just kept going from there.

almosthere · 2025-12-02T17:56:36 1764698196

The reality is too, that even if "what is acceptable" has not yet caught up to that guy working at Atlassian, polishing off a new field in Jira, people are using AI + Excel to manage their tasks EXACTLY the way their head works, not the way Jira works.

Yet we fail to see AI as a good thing but just as a jobs destroyer. Are we "better than" the people that used to fill toothpaste tubes manually until a machine was invented to replace them? They were just as mad when they got the pink slip.

vjvjvjvjghv · 2025-12-02T20:06:49 1764706009

I have told people that us techies have proudly killed the jobs of millions of people and we were arrogant about it. Now we are mad that it's our turn. Feels almost like justice :-)

MyFirstSass · 2025-11-27T13:08:30 1764248910

"Why senior developers just don't get how blockchain is going to change the world economy"

Yeah no, this angle is offensive.

I use LLM's daily as a search engine and syntax help - but this bizarre meta-meta-meta now if you just abstract even more it'l work, maybe invent an entire universe, or hey why not invent a cluster of universes that'l turn off the lights in eastern europe so you can vibecode... no, no, no.

I don't think this crazy energy wasting hell is getting us anywhere useful when someone's going to wade through the bullshit and the energy needs will go exponential on an already strained energy infrastructure not to mention the state of climate.

I still believe LLM's are helpful but in the other direction, more focused, smaller scaled and with less abstraction and magic happening.

MyFirstSass · 2025-11-27T12:31:24 1764246684

Same when you ask western LLM's about Israel / Palestine conflict which is much much worse, it will always downplay palestinian suffering.

But yeah both are very bad.

PunchyHamster · 2025-11-27T12:54:44 1764248084

I don't think that in particular is the LLM manufacturer downplaying that but just the amount of sources LLM was trained on does.

vs. in case of chinese it's more targeted censoring.

embedding-shape · 2025-11-27T20:55:51 1764276951

Which "Western LLMs" are you thinking about, specifically? Just tried with GPT-OSS-120b MXFP4 loaded via vLLM, and seemed to have handled it fine, no downplaying of widespread destruction of Gaza with civilian causalities back in 2009: https://gist.github.com/embedding-shapes/78719664df5d299938c...

Maybe I'm not asking the question the right way?

subscribed · 2025-11-28T08:56:41 1764320201

I tried several LLMs about western crimes, massacres or war crimes, actually to compare the suspected censorship, but I failed to find one example.

Which LLMs, then? I'd be glad to hear about similarly egregious censorship.

leobg · 2025-11-27T13:44:11 1764251051

How would you know? What world knowledge do you have access to that they do not?

bdangubic · 2025-11-27T16:17:44 1764260264

eyes plus access to non-US media

MyFirstSass · 2025-11-27T16:02:47 1764259367

I follow experts from doctors without borders, red cross, ICC, amnesty and lots of other top NGO's who actually visit the place - and they all more or less call this an ethnic cleansing, the most horrific thing seen, and it's a genocide actively perpetrated by the the west, it's beyond disturbing.

There's talks of upwards of 500 thousand deaths now, half a million, most of them civilians, women and children.

It's not in any way controversial anymore and the info is out there and has been for a long time.

That people like you call this into question, i'm truly shocked at the heartlessness. It's a slaughterhouse, just like the original holocaust and industrial in its scale and efficiency which makes it that more frightening.

MyFirstSass · 2025-11-23T21:52:22 1763934742

The number one priority must surely be fixing the keyboard, besides the horrible UX?

Millions are having problems for years so it's not just me, honestly thought i was "getting old" but no incredible amount of threads and now this on YT with 1 mil views:

https://www.youtube.com/watch?v=hksVvXONrIo

Wild typing on a 3210 was less stressful for me.

skwee357 · 2025-11-23T22:29:48 1763936988

I through I’m just getting stupider and stupider with each day. I even started to reset my iOS keyboard autocorrect dictionary or whatever-magic-learning they do to fix this, but eventually will still mistype words.

Thank you for this video, it really made me feel that I’m not alone in this struggle of typing.

paxys · 2025-11-24T00:02:53 1763942573

Installing the Google keyboard has been step #1 on every new iOS device of mine for the last decade. Sometimes I'll accidentally switch back to the default one while typing and immediately notice how broken the experience is. And yeah, I have definitely run into the exact issue shown in this video.

As a side note, I can't believe not a single device manufacturer has been able to make a blackberry-style keyboard work with a modern phone. Texting is by far the #1 activity on smartphones and yet when it comes to typing we have somehow gone backwards since the mid 2000s.

xethos · 2025-11-24T15:42:47 1763998967

Per your last point, there's at least one. My updated Q20 (BlackBerry Classic) just shipped, with the same keyboard, trackpad, and body - but new cameras, battery, and mainboard, meaning a relatively up-to-date Android

The Q27 (a ground-up new design) with a similar keyboard is in the works as well, with the Q20 being used to raise funds for the newly designed Q27

wenc · 2025-11-24T00:10:45 1763943045

I just downloaded Google Keyboard (Gboard) and tried typing "thumbs up". It still screws it up.

Not sure if it's better that the iOS keyboard.

wdb · 2025-11-24T01:57:17 1763949437

For some reason I have English&German, English&French keyboards and it always screws writing I just want a German or French keyboard when writing in that language. It's driving me crazy sometimes.

atonse · 2025-11-24T04:47:05 1763959625

Geez lord I can’t believe this ever made it out of testing.

Even all the text selection stuff stinks.

Honestly I’m about to disable Apple Intelligence. I don’t know what’s going on there.

What is everyone working on over at apple?

Anyone checked if this still happens (it typed that as “halles” which isn’t even a word!) even after disabling Apple Intelligence?

javier2 · 2025-11-24T17:52:04 1764006724

Most text selection has also completely broken overt he last couple of iOS versions.

emchammer · 2025-11-24T18:12:52 1764007972

macOS too though

unsnap_biceps · 2025-11-24T00:31:19 1763944279

I've been driving myself crazy over this. This video is a smoking gun proof that it's just broken.

javier2 · 2025-11-23T22:15:49 1763936149

I thought I was just incapable of learning. I find it so difficult to write on my ios keyboard.

gnubison · 2025-11-24T09:30:50 1763976650

iOS repeated “learns” new words that I use that are misspellings of real words (because I mistype things in a predictable way). It becomes so convinced that it will autocorrect the real word into the typo. And knowing how the keyboard works, I wouldn’t be surprised if it enlarges the touch targets for the typo once it thinks I meant to use the typo.

The cause is obvious: Apple is training on what I type, not what I send. Apple does not consider that I actually care about the accuracy of what I send and will fix errors; perhaps they optimize for people who are careless enough to send typoed messages, yet niche enough to commonly use words not in the default dictionary.

It is infuriating that I have ~50 manual corrections telling Apple to leave words alone and correct certain typos to the real words.

wenc · 2025-11-24T00:06:23 1763942783

I thought it was just iOS not being responsive enough to my thumb-typing speed.

But this was very validating to watch.

I love my iPhone but hate the iOS typing experience. It's so bad that I bought an external foldable Bluetooth keyboard that I keep in my bag just so that I could type longer emails.

I miss my BlackBerry keyboard.

firecall · 2025-11-23T22:29:35 1763936975

Oh wow!

I thought it was just me!

louthy · 2025-11-23T22:32:32 1763937152

Finally, confirmation that I’m not going mad! I remember when I got my first iPhone, back in the day, and demoing to a friend how it was almost impossible to misspell something when typing fast. That is just not the case now. Typing performance has got worse and worse.

Just typing out this comment has been infuriating.

wiseowise · 2025-11-23T22:22:59 1763936579

This is what 400k salary, 12 rounds of leetcode, system design and circus dancing interviews gets you, apparently.