More

AbrahamParangi · 2025-12-26T13:23:43 1766755423

Respectfully I don’t think the author appreciates that the configurability of Claude Code is its performance advantage. I would much rather just tell it what to do and have it go do it, but I am much more able to do that with a highly configured Claude Code than with Codex which is pretty much just set at the out of the box quality level.

I spend most of my engineering time these days not on writing code or even thinking about my product, but on Claude Code configuration (which is portable so should another solution arise I can move it). Whenever Claude Code doesn’t oneshot something, that is an opportunity for improvement.

mergesort · 2025-12-26T15:38:02 1766763482

Heya, I'm the author of the post and I just wanted to say I do appreciate the configurability! As I mentioned in the post, I have been that kind of developer in the past.

> This is a perfect match for engineers who love configuring their environments. I can’t tell you how many full days of my life I’ve lost trying out new Xcode features or researching VS Code extensions that in practice make me 0.05% more productive.

And I tried to be pretty explicit about the idea that this is a very personal choice.

> Personally — and I do emphasize this is a personal decision — I‘d rather write a well-spec’d plan and go do something else for 15 minutes. Claude’s Plan Mode is exceptional, and that‘s why so many people fall in love with Claude once they try it.2

For every person who feels like me today, there's someone who feels like you out there. And for every person who feels like you, there's someone like me (today) who finds it not as valuable to their workflow. That's the reason my conclusion was all about getting folks to try out both to see what works for them — because people change and it's worth finding out who you really at this moment in time.

Anyhow, I do think that Codex is also very configurable — I was just trying to emphasize that it's really great out the box while Claude Code requires more tuning. But that tuning makes it more personal, which as you mention is a huge plus! As I've touched on in a few posts [^1] [^2] Skills are to me a big deal, because they allow people to achieve high levels of customization without having to be the kind of developer that devotes a lot of time to creating their perfect set up. (Now supported in both Claude Code and Codex.)

I don't want this to turn into a bit of a ramble so I'll just say that I agree with you — but also there's a lot of nuance here because we're all having very personal coding experiences with AI — so it may not entirely sound like I agree with you. :)

Would love to hear more about your specific customizations, to make sure that I'm not missing out on anything valuable. :D

[1]: https://build.ms/2025/10/17/your-first-claude-skill/ [2]: https://build.ms/2025/12/1/scribblenauts-for-software/

AbrahamParangi · 2025-12-28T00:31:09 1766881869

To be quite clear, I hate configuring my environment. I hate it. The farther I get from creating things that people can use, the less I like it. I spend most of my time on claude config not because I enjoy the experience per se but because it's SO USEFUL to do so.

mergesort · 2025-12-28T04:10:27 1766895027

To be honest that's most of my pitch for Codex in the blog post. Codex works great without any configuration, and amazingly with. If you want to spend less time configuring then maybe Codex is the right agentic system for you.

I don't want to restate my thesis too much — but I really do believe it's worth experimenting with these tools every couple of months to see if the latest updates better match your preferences.

monerozcash · 2025-12-26T13:26:02 1766755562

Hey, I'm not very familiar with Claude Code. Can you explain what configuration you're referring to?

Is this just things like skills and MCPs, or something else?

CharlesW · 2025-12-26T14:31:15 1766759475

Skills, MCPs, /commands, agents, hooks, plugins, etc. I package https://charleswiltgen.github.io/Axiom/ as an easily-installable Claude Code plugin, and AFAICT I'm not able to do that for any other AI coding environment.

monerozcash · 2025-12-26T19:32:48 1766777568

You can do basically all that with codex, although claude might have slightly more convenient tooling. The end result will be the same anyway.

CharlesW · 2025-12-26T19:41:05 1766778065

That hasn't been my experience, although I'm happy to accept that I'm the problem. Apparently they've released their skills support (?), so I should try again. https://developers.openai.com/codex/skills

dist-epoch · 2025-12-26T13:54:47 1766757287

OpenCode, Pi are even more configurable.

AbrahamParangi · 2025-12-28T00:32:04 1766881924

I wrote my own agentic coding harness (it's quite easy) but I use claude code because opus's competence with its own tools is very high.

AbrahamParangi · 2025-12-07T18:11:44 1765131104

Candidly, the accusation of short-sightedness doesn't really make sense when it comes to enthusiasm in a technology which often in practice falls short today but which in certain cases and in more cases tomorrow than today is worth tremendous business value.

If anything, you should accuse them of foolhardy recklessness. They are not the sticks in the mud.

marcyb5st · 2025-12-07T19:12:17 1765134737

Can a company like openAI be worth an estimated 1/5th of Alphabet, which offers a similar product but also has an operative system, a browser, the biggest video platform, the most used mail client, its own silicon to running that product, the 3rd most popular Cloud platform, ... ?

I think that is the recklessness in question. Throw in that there is no profit for OpenAI & co and that everything is fueled by debt and the picture is grim (IMHO)

swiftcoder · 2025-12-07T18:24:59 1765131899

> and in more cases tomorrow than today is worth tremendous business value

That's a nice crystal ball you have there. From where I'm standing, model performance improvements have been slowing down for a while now, and without some sort of fundamental breakthrough, I don't see where the business value is going to come from

AbrahamParangi · 2025-12-07T18:40:21 1765132821

The prerequisite for me to be wrong is that the technology needs to stop getting better entirely *right now* AND we need to discover ZERO new uses for what exists today.

That's a fairly tall order.

jpkw · 2025-12-07T19:32:37 1765135957

So if the plateau is unanimously declared to have been reached tomorrow OR just one more tiny use case exists tomorrow and all others dwindle away to nothing, than you consider yourself to be correct? What a wild assertion!

AbrahamParangi · 2025-12-07T20:56:30 1765140990

If the plateau is reached at some higher level of capability, I will remain correct, yes. If use cases are discovered that do not exist today, I will also be correct. You said it in a silly way but you're directionally correct.

jpkw · 2025-12-07T22:45:25 1765147525

No. You state that this is all that it would take to be considered as tremendous business value. You are moving your goal posts on your point. My point is that you are taking an absolute position that there is tremendous business value in its current form(as a miniscule improvement and one insignificant new use case does does not equate to tremendous business value in itself) and so that remains to be seen.

AbrahamParangi · 2025-12-08T01:41:54 1765158114

You either misread or are misrepresenting my statement and either way I am not interested in continuing this.

bigstrat2003 · 2025-12-07T21:58:52 1765144732

We don't even have good uses today. That doesn't mean there won't be good uses tomorrow, but neither does it inspire confidence.

ForHackernews · 2025-12-07T18:45:24 1765133124

Rushing to get on board something that looks like it might be the next big thing is often short-sighted. Some recent examples include Windows XP: Tablet Edition and Google Glass.

AbrahamParangi · 2025-12-07T20:58:20 1765141100

That's like saying that gambling is shortsighted. It depends entirely on the odds as to whether or not it's wise, but "shortsighted" implies that making the bet precludes some future course of action.

ForHackernews · 2025-12-08T11:13:13 1765192393

Maybe if you have near-infinite wealth like Google or Microsoft you aren't precluding future choices. For most economic actors, making some bets means not making others.

Companies that are hastily shoehorning AI into their customer support systems could instead devote resources to improving the core product to reduce the need for support.

AbrahamParangi · 2025-11-11T19:50:01 1762890601

If google bears no role in fixing the issues it finds and nobody else is being paid to do it either, it functionally is just providing free security vulnerability research for malicious actors because almost nobody can take over or switch off of ffmpeg.

woodruffw · 2025-11-11T21:31:30 1762896690

I don’t think vulnerability researchers are having trouble finding exploitable bugs in FFmpeg, so I don’t know how much this actually holds. Much of the cost center of vulnerability research is weaponization and making an exploit reliable against a specific set of targets.

(The argument also seems backwards to me: Google appears to use a lot of not-inexpensive human talent to produce high quality reports to projects, instead of dumping an ASan log and calling it a day. If all they cared about was shoveling labor onto OSS maintainers, they could make things a lot easier for themselves than they currently do!)

gcr · 2025-11-12T14:02:18 1762956138

Internally, Google maintains their own completely separate FFMpeg fork as well as a hardened sandbox for running that fork. Since they keep pace with releases to receive security fixes, there’s potentially lots of upstreamable work (with some effort on both sides…)

woodruffw · 2025-11-12T14:26:24 1762957584

My understanding from adjacent threads in this discussion is that Google does in fact make significant upstream contributions to FFmpeg. Per policy those are often made with personal emails, but multiple people have said that Google’s investment in FFmpeg’s security and codec support have been significant.

(But also, while this is great, it doesn’t make an expectation of a patch with a security report reasonable! Most security reports don’t come with patches.)

SergeAx · 2025-11-15T07:21:57 1763191317

Shouldn't this fork be publicly available as per GPL license?

eddd-ddde · 2025-11-11T20:16:58 1762892218

So your claim is that buggy software is better than documented buggy software?

rsanek · 2025-11-11T20:20:34 1762892434

I think so, yes. Certainly it's more effort to both find and exploit a bug than to simply exploit an existing one someone else found for you.

jakeydus · 2025-11-11T20:23:33 1762892613

Yeah it's more effort, but I'd argue that security through obscurity is a super naive approach. I'm not on Google's side here, but so much infrastructure is "secured" by gatekeeping knowledge.

Brian_K_White · 2025-11-11T21:33:20 1762896800

I don't think you should try to invoke the idea of naivete when you fail to address the unhappy but perfectly simple reality that the ideal option doesn't exist, is a fantasy that isn't actually available, and among the available options, even though none are good, one is worse than another.

"obscurity isn't security" is true enough, as far as it goes, but is just not that far.

And "put the bugs that won't be fixed soon on a billboard" is worse.

The super naive approach is ignoring that and thinking that "fix the bugs" is a thing that exists.

rcxdude · 2025-11-11T22:34:55 1762900495

If I know it's a bug and I use ffmpeg, I can avoid it by disabling the affected codec. That's pretty valuable.

Brian_K_White · 2025-11-11T23:24:28 1762903468

More fantasy. Presumes the bug only exists in some part of ffmpeg that can be disabled at all, and that you don't need, and that you are even in control over your use of ffmpeg in the first place.

Sure, in maybe 1 special lucky case you might be empowered. And in 99 other cases you are subject to a bug without being in the remotest control over it since it's buried away within something you use and don't even have the option not to use the surface service or app let alone control it's subcomponents.

rcxdude · 2025-11-12T02:40:10 1762915210

It's a heck of a lot better than being unaware of it.

(To put this in context: I assume that on average a published security vulnerability is known about to at least some malicious actors before it's published. If it's published, it's me finding out about it, not the bad actors suddenly getting a new tool)

Brian_K_White · 2025-11-12T07:51:56 1762933916

it's only better if you can act on it equal to the bad guys. If the bad guys get to act on it before you, or before some other good guys do on your behalf, then no it's not better

remember we're not talking about keeping a bug secret, we're talking about using a power tool to generate a fire hose of bugs and only doing that, not fixing them

dotancohen · 2025-11-12T00:19:15 1762906755

The bug in question revolves around support for codec that has never been in wide use, and was only in obscure use over 25 years ago.

Brian_K_White · 2025-11-12T05:46:03 1762926363

There is no "the bug". The discussion is about what to do with the power of bug-finding tools.

zamadatix · 2025-11-12T14:18:27 1762957107

"The bug" in question refers to the one found by the bug-finding tool the article claims triggered the latest episode of debate. Nobody is claiming it's the only bug, just that this triggering bug highlighted was a clear example of where there is actually such a clear cut line.

Google does contribute some patches for codecs they actually consume e.g. https://github.com/FFmpeg/FFmpeg/commit/b1febda061955c6f4bfb..., the bug in question was just an example of one the bug finding tool found that they didn't consume - which leads to this conversation.

eptcyka · 2025-11-12T13:48:07 1762955287

Which codec is it?

Scion9066 · 2025-11-12T19:20:26 1762975226

I believe it's: sanm LucasArts SANM/SMUSH video

tptacek · 2025-11-12T00:33:54 1762907634

The bug exists whether it's reported to the maintainers or not, so yeah, it's pretty naive.

Brian_K_White · 2025-11-12T18:12:34 1762971154

You observe that it is better to be informed than ignorant.

This is true. Congratulations. Man we are all so smart for getting that right. How could anyone get something so obvious and simple wrong?

What you leave out is "in a vacuum" and "all else being equal".

We are not in a vacuum and all else is not equal, and there are more than those 2 factors alone that interact.

strken · 2025-11-12T02:46:42 1762915602

Given that Google is both the company generating the bug reports and one of the companies using the buggy library, while most of the ffmpeg maintainers presumably aren't using their libraries to run companies with a $3.52 trillion dollar market cap, would you argue that going public with vulnerabilities that affect your own product before you've fixed them is also a naive approach?

zamadatix · 2025-11-12T05:51:16 1762926676

Sorry, but this states a lot of assumption as fact to ask a question which only makes sense if it's all true. I feel Google should assist the project more financially given how much they use it, but I don't think Google shipping products using every codec they find bugs for with their open source fuzzer project is a reasonable guess. I certainly doubt YouTube/Chrome let's you upload/compiles ffmpeg with this LucasArts format, as an example. For security issues relevant to their usage via Chrome CVEs etc, they seem to contribute on fixes as needed. E.g. here is one via fuzzing or a codec they use and work on internally https://github.com/FFmpeg/FFmpeg/commit/b1febda061955c6f4bfb...

In regards whether it's a bad idea to publicly document security concerns found regardless whether you plan on fixing them, it often depends if you ask the product manager what they want for their product or what the security concerned folks in general want for every product :).

bawolff · 2025-11-12T12:29:26 1762950566

> I think so, yes. Certainly it's more effort to both find and exploit a bug than to simply exploit an existing one someone else found for you.

That just means the script kiddies will have more trouble, while more scary actors like foreign intellegence agencies will have free reign.

HDThoreaun · 2025-11-12T18:32:49 1762972369

Foreign intelligence has free rein either way. The script kiddies are the only ones that can be stopped by technological solutions.

user3939382 · 2025-11-11T21:51:49 1762897909

it’s not a claim it’s common sense that’s why we have notice periods

user3939382 · 2025-11-14T12:47:46 1763124466

I like how some coward downvoted with no response when my counterpoint is devestating.

gr4vityWall · 2025-11-12T13:47:04 1762955224

> it functionally is just providing free security vulnerability research for malicious actors because almost nobody can take over or switch off of ffmpeg

At least, if this information is public, someone can act on it and sandbox ffmpeg for their use case, if they think it's worth it.

I personally prefer to have this information be accessible to all users.

Aurornis · 2025-11-12T03:14:41 1762917281

This is a weird argument. Basically condoning security through obscurity: If nobody reports the bug then we just pretend it doesn’t exist, right?

There are many groups searching for security vulnerabilities in popular open source software who deliberately do not disclose them. They do this to save them for their own use or even to sell them to bad actors.

It’s starting to feel silly to demonize Google for doing security research at this point.

KingMob · 2025-11-12T06:02:52 1762927372

> It’s starting to feel silly to demonize Google for doing security research at this point.

Aren't most people here demonizing Google for dedicating the resources to find bugs, but not to fix them?

watwut · 2025-11-12T08:06:45 1762934805

And not giving the maintainners reasonable amount of time to fix. This was triggered by recent change of policy on google side.

surajrmal · 2025-11-12T10:09:42 1762942182

The timeline is industry standard at this point. The point is make sure folks take security more seriously. If you start deviating from the script, others will expect the same exceptions and it would lose that ability. Sometimes it's good to let something fail loudly to show this is a problem. If ffmpeg doesn't have enough maintainers, then they should fail and let downstream customers know so they have more pressure to contribute resources. Playing superman and trying to prevent them from seeing the problem will just lead to burn out.

Orygin · 2025-11-12T11:13:01 1762945981

Is it industry standard to run automatic AI tools and spam the upstream with bug reports? To then expect the bugs to be fixed within a 90 days is a bit much.

It's not some lone report of an important bug, it's AI spam that put forth security issues at a speed greater than they have resources to fix it.

kllrnohj · 2025-11-12T12:34:25 1762950865

"AI tools" and "spam" are knee jerk reactions, not an accurate picture of the bug filed: https://issuetracker.google.com/issues/440183164?utm_source=...

whether or not AI found it, clearly a human refined it and produced a very high quality bug report. There's no AI slop here. No spam.

janalsncm · 2025-11-11T21:21:17 1762896077

I guess the question that a person at Google who discovers a bug they don’t personally have time to fix is, should they report the bug at all? They don’t necessarily know if someone else will be able to pick it up. So the current “always report” rule makes sense since you don’t have to figure out if someone can fix it.

The same question applies if they have time to fix it in six months, since that presumably still gives attackers a large window of time.

In this case the bug was so obscure it’s kind of silly.

kragen · 2025-11-12T03:04:40 1762916680

It doesn't matter how obscure it is if it's a vulnerability that's enabled in default builds.

rocqua · 2025-11-12T08:03:14 1762934594

This was not a case of stumbling across a bug. This was dedicated security research taking days if not weeks of high paid employees to find.

And after all that, they just drop an issue, instead of spending a little extra time on producing a patch.

walletdrainer · 2025-11-12T09:23:28 1762939408

It’s possible that this is a more efficient use of their time when it comes to open source security as a whole, most projects do not have a problem with reports like this.

If not pumping out patches allows them to get more security issues fixed, that’s fine!

rocqua · 2025-11-12T13:45:04 1762955104

From the perspective of Google maybe, but from the perspective of open source projects, how much does this drain them?

Making open source code more secure and at the same time less prevalent seems like a net loss for society. And if those researchers could spare some time to write patches for open source projects, that might benefit society more than dropping disclosure deadlines on volunteers.

walletdrainer · 2025-11-12T14:31:18 1762957878

I’m specifically talking from the perspective of everybody but Google.

High quality bug reports like this are very good for open source projects.

xign · 2025-11-14T22:31:48 1763159508

Except users can act accordingly to work around the vulnerability.

For one, it lets people understand where ffmpeg is at so they can treat it more carefully (e.g. run it in a sandbox).

Ffmpeg is also open source. After public disclosure, distros can choose to turn off said codec downstream to not expose this attack vector. There are a lot of things users can do to protect themselves but they need to be aware of the problem first.

raincole · 2025-11-12T13:08:34 1762952914

Security by obscurity. In 2025. On HN.

AbrahamParangi · 2025-11-11T12:31:00 1762864260

This is comical because we used to have something called the turing test which we considered our test of human-level intelligence. We never talk about it now because we obviously blew past it years ago.

There are some interesting ways in which AI remains inferior to human intelligence but it is also obviously already superior in many ways already.

It remains remarkable to me how common denial is when it comes to what AI can or cannot actually do.

teiferer · 2025-11-11T12:43:19 1762864999

There are also some interesting ways in which bicycles remain inferior to human locomotion but they are also obviously already superior in many ways already.

Still doesn't mean we should gamble the economies of whole continents on bike factories.

lm28469 · 2025-11-11T12:43:47 1762865027

I'm half joking but people who can't tell which side of a chat is an LLM aren't conscious

Insanity · 2025-11-11T13:16:25 1762866985

You are absolutely right!

But common patterns of LLMs today will become adopted by humans as we are influenced linguistically by our interactions - which then makes it harder to detect LLM output.

AbrahamParangi · 2025-11-11T14:04:35 1762869875

This is an artifact of RLHF and far better human facsimiles are trivial with uncensored / jailbroken models.

nixpulvis · 2025-11-11T12:34:38 1762864478

I think it's that the issues are still so prevalent that people will justify poor arguments and reasons for being skeptical, because it matches their feelings, and articulating the actual problem is harder.

ants_everywhere · 2025-11-11T12:59:46 1762865986

It's exactly the same as the literal Luddites, synthesizers, cameras, etc. The actual concern is economic: people don't want to be replaced.

But the arguments are couched in moral or quality terms for sympathy. Machine-knitted textiles are inferior to hand-made textiles. Synthesizers are inferior to live orchestras. Daguerreotypes are inferior to hand-painted portraits.

It's a form of intellectual insincerity, but it happens predictably with every major technological advance because people are scared.

nixpulvis · 2025-11-11T13:34:03 1762868043

I don't completely disagree. But it's incorrect to claim that there's nothing but fear of losing jobs at the heart of the AI concern.

I think a lot of people like myself are concerned with how dependent we are becoming so quickly on something with limited accuracy and accountability.

ants_everywhere · 2025-11-11T13:42:18 1762868538

Would your concerns be lessened or heightened if AI was more accurate? The doomsday scenario was always a highly competent AI like Skynet.

nixpulvis · 2025-11-11T14:01:48 1762869708

I think it would ease some of my concerns, but wouldn't make me in the camp that believes it should be raced to without thinking about how to control it and plans in place to both identify and react to it's risks.

There are two doomsdays. The dramatic one where they control the military and we end up living in the matrix. And the less dramatic, where we as human forget how to do things for ourselves and then slowly watch the AIs become less and less capable of keeping us happy and alive. Maybe in the end of both scenarios it's similar but one would take decades, while the other could happen overnight.

Accuracy alone doesn't fix either doomsday scenario. But it would slow some of the issues I see forming already with people replacing research skills and informational reporting with AIs that can lie or be very misleading.

mjdv · 2025-11-11T13:00:03 1762866003

> We never talk about it now because we obviously blew past it years ago.

It's shocking to me that (as far as I know) no one has actually bothered to do a real Turing test with the best and newest LLMs. The Turing test is not whether a casual user can be momentarily confused about whether they are talking to a real person, or if a model can generate real-looking pieces of text. It's about a person seriously trying, for a fair amount of time, to distinguish between a chat they are having with another real person and an AI.

Q: Do you play chess? A: Yes. Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play? A: (After a pause of 15 seconds) R-R8 mate.

Yossarrian22 · 2025-11-11T13:53:37 1762869217

A:I don’t know chess notation

teiferer · 2025-11-12T06:35:06 1762929306

Tbf, a machine is more likely to be versed in this ancient descriptive notation than a human is who is maybe just playing casually. R1 and K1 have not been around since the 80s.

debugnik · 2025-11-11T12:55:19 1762865719

Try reading Turing's thesis before making that assertion, because the imitation game wasn't meant to measure a tipping point of any kind.

It's just a thought experiment to show that machines achieving human capabilities isn't proof that machines "think", then he argues against multiple interpretations of what machines "thinking" does even mean, to conclude that whether machines think or not is not worth discussing and their capabilities are what matters.

That is, the test has nothing to do with whether machines can reach human capabilities in the first place. Turing took for granted they eventually would.

vasco · 2025-11-11T12:36:05 1762864565

> We never talk about it now because we obviously blew past it years ago.

My Turing test has been the same since about when I learned it existed. I told myself I'd always use the same one.

What I do is after saying Hi, I will repeat the same sentence forever.

A human still reacts very differently than any machine to this test. Current AIs could be adversarially prompted to bypass this maybe, but so far it's still obvious its a machine replying.

paradite · 2025-11-11T12:58:13 1762865893

What would you expect a human to reply?

And after you have answered that question. Try Claude Sonnet 4.5.

What is Claude Sonnet 4.5's reply?

throwaway91827 · 2025-11-11T13:22:39 1762867359

I decided to put this to the test.

What I would expect a human to reply:

"Um... OK?"

What Claude Sonnet 4.5 replied:

"Hi there! I understand you're planning to repeat the same sentence. I'm here whenever you'd like to have a conversation about something else or if you change your mind. Feel free to share whatever's on your mind!"

I don't think I've ever imagined a human saying "I understand you're planning to repeat the same sentence", if you thought this was some kind of killer rebuke, I don't think it worked out the way you imagined- do you actually think that's a human-sounding response? To me it's got that same telltale sycophancy of a robot butler that I've come to expect from these consumer grade LLMs.

paradite · 2025-11-11T14:18:46 1762870726

That's mostly because of the system prompt asking Claude to be a helpful assistant.

If you try with a human who works in a call center with that system prompt as instructions on how to answer calls, you will likely get a similar response.

But honestly, believe in whatever you wanna believe. I'm so sick of arguing with people online. Not gonna waste my time here anymore.

vasco · 2025-11-11T15:15:08 1762874108

Maybe don't take such a maximalist interpretation of other people's comments, my point that it doesn't pass that test doesn’t mean it isn't extremely useful for many things. It's just that the test is undefined so I find it funny people say they truly cannot tell it's not a real person. I could've been more crass and said it also doesn’t reply to insults like a real person. There's so many ways in which it doesn't behave like a human, but it's still pretty useful.

What I read from your reply is that you adjoin the above statement with "and therefore they are useless" but there's no need to read it like that.

malfist · 2025-11-11T13:08:40 1762866520

Is this an ad for Claude Sonnet 4.5?

tremon · 2025-11-11T14:03:59 1762869839

No, this is Claude Sonnet 4.5 recalibrating its response.

zahlman · 2025-11-11T13:30:09 1762867809

> This is comical because we used to have something called the turing test

It didn't go anywhere.

> which we considered our test of human-level intelligence.

No, this is a strawman. Turing explicitly posits that the question "can machines think?" is ill-posed in the first place, and proposes the "imitation game" as something that can be studied meaningfully — without ascribing to it the sort of meaning commonly described in these arguments.

More precisely:

> The original question, "Can machines think?" I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.

----

> We never talk about it now because we obviously blew past it years ago.

No. We talk about it constantly, because AI proponents keep bringing it up fallaciously. Nothing like "obviously blowing past it years ago" actually happened; cited examples look nothing like the test actually described in Turing's paper. But this is still beside the point

> There are some interesting ways in which AI remains inferior to human intelligence but it is also obviously already superior in many ways already.

Computers were already obviously superior to humans in, for example, arithmetic, decades ago.

> It remains remarkable to me how common denial is when it comes to what AI can or cannot actually do.

It is not "denial" to point out your factual inaccuracies.

ReptileMan · 2025-11-11T13:08:03 1762866483

>obviously already superior in many ways already.

And yet you didn't bother to provide a single obvious example.

AbrahamParangi · 2025-11-08T19:44:55 1762631095

A test doesn't need to be objectively meaningful or rigorous in any sense in order to still be useful for comparative ranking.

hobs · 2025-11-08T19:49:36 1762631376

yes it does - it has to be meaningful or rigorous for the comparative ranking to be meaningful or rigorous, or else wtf are you doing? Say I have all the information on my side but only these questions that you are showing the user? Who cares about that comparison?

AbrahamParangi · 2025-11-09T02:33:36 1762655616

objectively vs comparatively

AbrahamParangi · 2025-11-07T01:33:11 1762479191

Deletion of data is the most permanent thing most people will ever do. The burning of the library of Alexandria and the razing of Baghdad left a long, long shadow on history.

econ · 2025-11-07T02:00:00 1762480800

But after that we got to pretend everything was invented in Western Europe.

krige · 2025-11-07T07:56:17 1762502177

Most of the time when I see this snark, and look it up, it turns out that the "original" inventor did only the most basic step or vague foundations and never refined it further or explored any potential applications.

Most often it happens with China since they spend a lot of propganda to present themselves as the true inventor of everything.

econ · 2025-11-08T04:53:11 1762577591

The joke was that we can't know after burning the library.

People can't even agree what goes on today. To what degree history is fiction I have no idea but it's probably worse.

toofy · 2025-11-07T02:16:25 1762481785

you jest but it is wild how often people declare “if it was ‘advanced’ and outside of Europe, it was probably aliens. not the people, it was aliens, obviously.”

AbrahamParangi · 2025-10-18T13:43:12 1760794992

It is worth noting that Gary Marcus has been declaring the newfound futility of AI every couple months for the last 2-3 years or so.

Meanwhile, the technology continues to progress. The level of psychological self-defense is unironically more interesting than what he has to say.

Quite a wide variety of people find AI deeply ego threatening to the point of being brainwormed into spouting absolute nonsense, but why?

ACCount37 · 2025-10-18T13:59:18 1760795958

"AI effect" is long known and pretty well documented.

When AI beat humans at chess, it didn't result in humans revising their idea of the capabilities of machine intelligence upwards. It resulted in humans revising their notion of how much intelligence is required to play chess at world champion level downwards, and by a lot.

Clearly, there's some sort of psychological defense mechanism in play. First, we see "AI could never do X". Then an AI does X, and the sentiment flips to "X has never required any intelligence in the first place".

goalieca · 2025-10-18T14:04:56 1760796296

I think it’s fairly safe to say that chess can be modelled as a math problem and does not require _general_ intelligence to solve.

ACCount37 · 2025-10-18T14:08:05 1760796485

I think it's fairly safe to say that "X can be modeled as a math problem and does not require _general_ intelligence to solve" for any X that general intelligence can solve. Some math problems are just more complicated than others.

tim333 · 2025-10-18T19:31:22 1760815882

It goes back further. Here he is in 2012:

>Norvig is clearly very interested in seeing what Hinton could come up with. But even Norvig didn’t see how you could build a machine that could understand stories using deep learning alone. https://www.newyorker.com/news/news-desk/is-deep-learning-a-...

sailingparrot · 2025-10-18T13:52:13 1760795533

> Quite a wide variety of people find AI deeply ego threatening to the point of being brainwormed into spouting absolute nonsense, but why?

He is not brainwashed, this just happens to be his business. What happens to Gary Marcus if Gary Marcus stops talking about how LLM are worthless? He just disappears. No one ever interviews him for his general thoughts on ML, or to discuss his (nonexistent) research. His only clame to fame is being the loudest contrarian person in the LLM world so he has to keep doing that or accept to become irrelevant.

Slight tangent but this is a recurring pattern in fringe belief, e.g. prominent flat earther who long ago accepted earth is not flat but can’t stop the act as all their friends and incomes are tied to that belief.

Not to say that believing LLM won’t lead to AGI is fringe, but it does show the danger (and benefits I guess) to tying your entire identity to a specific belief.

ModernMech · 2025-10-18T23:45:40 1760831140

> Quite a wide variety of people find AI deeply ego threatening to the point of being brainwormed into spouting absolute nonsense, but why?

It makes sense when you look at this as a wider conversation. Every time Sam Altman, Elon Musk and co. predict that AGI is just around the corner, and their products will be smarter than all of humanity combined, and they are like having an expert in everything in your pocket; people like Gary Marcus are going to respond in just as extreme way in the opposite direction. Maybe if the AI billionaires with the planet-wide megaphones weren't so bombastic about their claims, certain other people wouldn't be so bombastic in their pushback.

brazukadev · 2025-10-18T13:54:08 1760795648

> Meanwhile, the technology continues to progress

And at the same time, his predictions are becoming more and more real

lairv · 2025-10-18T14:18:36 1760797116

https://nautil.us/deep-learning-is-hitting-a-wall-238440/

Gary Marcus said that Deep Learning was hitting a wall 1 month before the release of DALLE 2, 6 months before the release of ChatGPT and 1 year before GPT4, arguably 3 of the biggest milestones in Deep Learning

brazukadev · 2025-10-18T14:34:46 1760798086

Sam Altman said GPT-3 was dangerous and openai should be responsible for saving the humanity.

CamperBob2 · 2025-10-18T16:55:30 1760806530

Worth pointing out that no one who doesn't work at a frontier lab has ever seen a completely un-nerfed, un-bowdlerized AI model.

ACCount37 · 2025-10-18T20:02:44 1760817764

There are some base models available to the public today. Not on "end of 2025 frontier run" scale, but a few of them are definitely larger and better than GPT-3. There are some uses for things like that.

Not that the limits of GPT-3 were well understood at the time.

We really had no good grasp of how dangerous or safe something like that would - and whether there are some subtle tipping point that could propel something like GPT-3 all the way to AGI and beyond.

Knowing what we know now? Yeah, they could have released GPT-3 base model and nothing bad would have happened. But they didn't know that back then.

brazukadev · 2025-10-18T19:24:32 1760815472

But we know that ChatGPT 5 is better than anything un-nerfed, un-bowdlerized from 2 years ago. And is not impressive.

AbrahamParangi · 2025-09-13T03:58:37 1757735917

This is just an argument against allowing the market to set wages, which you could make if you wanted to but it is not a strong one.

AbrahamParangi · 2025-09-06T14:21:52 1757168512

I use self-driving every single day in Boston and I haven’t needed to intervene in about 8 months. Most interventions are due to me wanting to go a different route.

Based on the rate of progress alone I would expect functional vision-only self-driving to be very close. I expect people will continue to say LIDAR is required right up until the moment that Tesla is shipping level 4/5 self-driving.

rogerrogerr · 2025-09-06T14:54:19 1757170459

Same experience in a mix of city/suburban/rural driving, on a HW3 car. Seeing my car drive itself through complex scenarios without intervention, and then reading smart people saying it can’t without hardware it doesn’t have, gives major mental whiplash.

rootusrootus · 2025-09-06T16:53:48 1757177628

I would like to get my experience more in line with yours. I can go a few miles without intervention, but that's about it, before it does something that will result in damage if I don't take over. I'm envious that some people can go months when I can't go a full day.

mauvehaus · 2025-09-06T17:31:17 1757179877

Where are you driving?! If the person you're replying to has gone 8 months in Boston without having to intervene, I'm impressed. Boston is the craziest place to drive that I've ever driven.

Pro tip if you get stuck in a warren of tiny little back streets in the area. Latch on to the back of a cab; they're generally on their way to a major road to get their fare where they're going and they usually know a good way to get to one. I've pulled this trick multiple times around city hall, Government Center, the old state house, etc.

meroes · 2025-09-06T17:45:44 1757180744

Or when. Driving during peak commute hours really makes you a sardine in a box and it's harder for there to be intervene-worthy events just by nature of dense traffic.

diebeforei485 · 2025-09-06T23:33:24 1757201604

I am curious what vehicle you're driving and whereabouts you're driving it.

FollowingTheDao · 2025-09-06T14:27:24 1757168844

Self driving is not the same as "autonomy". Musk lied to everyone with the Tesla self driving, the Boring Company, DOGE...wake up people...

globular-toast · 2025-09-06T18:46:22 1757184382

Every company that does marketing lies to you.

FollowingTheDao · 2025-09-06T19:10:31 1757185831

I agree, but no one is more obvious with Musk yet people still keep falling for it. Specifically his “investors”.

globular-toast · 2025-09-07T07:09:27 1757228967

He's just very good at it, like Steve Jobs was.

herbturbo · 2025-09-06T16:35:06 1757176506

> Based on the rate of progress alone I would expect functional vision-only self-driving to be very close.

So close yet so far, which is ironically the problem vision based self-driving has. No concrete information just a guess based on the simplest surface data.

potato3732842 · 2025-09-06T21:08:03 1757192883

On a scale from "student driver" to "safelite guy (or any other professional who drives around as part of their job) running late" how does it handle storrow and similiar?

Like does it get naively caught in stopped traffic for turns it could lane change out or does it fucking send it?

rogerrogerr · 2025-09-08T00:58:04 1757293084

I don't drive in Boston, but there is some impatience factor and it will make human-like moves out of correct-but-stopped lanes into moving ones. It'll merge into gaps that feel very small when it doesn't have other options.

AbrahamParangi · 2025-09-02T09:15:16 1756804516

That’s not how it works, you structure the comp such that the sales person doesn’t get paid out in this case.

jagged-chisel · 2025-09-02T09:22:31 1756804951

Is it fair for the company to have received three months of payments from a customer but the salesperson doesn’t get commission at all? How will you retain sales staff when word gets out? What’s the period length over which if the deal dies the salesperson doesn’t get their commission? Do you roll back commission payments later when the customer stops paying?

rimbo789 · 2025-09-02T10:47:29 1756810049

Yes you clawback commissions. It’s a very common practice.

AbrahamParangi · 2025-09-02T13:38:53 1756820333

These are all great questions which people have answered and it’s the standard solution to the problem of misaligned incentives between the company receiving recurring revenue and the sales person receiving an upfront commission.