Respectfully I don’t think the author appreciates that the configurability of Claude Code is its performance advantage. I would much rather just tell it what to do and have it go do it, but I am much more able to do that with a highly configured Claude Code than with Codex which is pretty much just set at the out of the box quality level.
I spend most of my engineering time these days not on writing code or even thinking about my product, but on Claude Code configuration (which is portable so should another solution arise I can move it). Whenever Claude Code doesn’t oneshot something, that is an opportunity for improvement.
Heya, I'm the author of the post and I just wanted to say I do appreciate the configurability! As I mentioned in the post, I have been that kind of developer in the past.
> This is a perfect match for engineers who love configuring their environments. I can’t tell you how many full days of my life I’ve lost trying out new Xcode features or researching VS Code extensions that in practice make me 0.05% more productive.
And I tried to be pretty explicit about the idea that this is a very personal choice.
> Personally — and I do emphasize this is a personal decision — I‘d rather write a well-spec’d plan and go do something else for 15 minutes. Claude’s Plan Mode is exceptional, and that‘s why so many people fall in love with Claude once they try it.2
For every person who feels like me today, there's someone who feels like you out there. And for every person who feels like you, there's someone like me (today) who finds it not as valuable to their workflow. That's the reason my conclusion was all about getting folks to try out both to see what works for them — because people change and it's worth finding out who you really at this moment in time.
Anyhow, I do think that Codex is also very configurable — I was just trying to emphasize that it's really great out the box while Claude Code requires more tuning. But that tuning makes it more personal, which as you mention is a huge plus! As I've touched on in a few posts [^1] [^2] Skills are to me a big deal, because they allow people to achieve high levels of customization without having to be the kind of developer that devotes a lot of time to creating their perfect set up. (Now supported in both Claude Code and Codex.)
I don't want this to turn into a bit of a ramble so I'll just say that I agree with you — but also there's a lot of nuance here because we're all having very personal coding experiences with AI — so it may not entirely sound like I agree with you. :)
Would love to hear more about your specific customizations, to make sure that I'm not missing out on anything valuable. :D
To be quite clear, I hate configuring my environment. I hate it. The farther I get from creating things that people can use, the less I like it. I spend most of my time on claude config not because I enjoy the experience per se but because it's SO USEFUL to do so.
To be honest that's most of my pitch for Codex in the blog post. Codex works great without any configuration, and amazingly with. If you want to spend less time configuring then maybe Codex is the right agentic system for you.
I don't want to restate my thesis too much — but I really do believe it's worth experimenting with these tools every couple of months to see if the latest updates better match your preferences.
Skills, MCPs, /commands, agents, hooks, plugins, etc. I package https://charleswiltgen.github.io/Axiom/ as an easily-installable Claude Code plugin, and AFAICT I'm not able to do that for any other AI coding environment.
That hasn't been my experience, although I'm happy to accept that I'm the problem. Apparently they've released their skills support (?), so I should try again. https://developers.openai.com/codex/skills
Candidly, the accusation of short-sightedness doesn't really make sense when it comes to enthusiasm in a technology which often in practice falls short today but which in certain cases and in more cases tomorrow than today is worth tremendous business value.
If anything, you should accuse them of foolhardy recklessness. They are not the sticks in the mud.
Can a company like openAI be worth an estimated 1/5th of Alphabet, which offers a similar product but also has an operative system, a browser, the biggest video platform, the most used mail client, its own silicon to running that product, the 3rd most popular Cloud platform, ... ?
I think that is the recklessness in question. Throw in that there is no profit for OpenAI & co and that everything is fueled by debt and the picture is grim (IMHO)
> and in more cases tomorrow than today is worth tremendous business value
That's a nice crystal ball you have there. From where I'm standing, model performance improvements have been slowing down for a while now, and without some sort of fundamental breakthrough, I don't see where the business value is going to come from
The prerequisite for me to be wrong is that the technology needs to stop getting better entirely *right now* AND we need to discover ZERO new uses for what exists today.
So if the plateau is unanimously declared to have been reached tomorrow OR just one more tiny use case exists tomorrow and all others dwindle away to nothing, than you consider yourself to be correct? What a wild assertion!
If the plateau is reached at some higher level of capability, I will remain correct, yes. If use cases are discovered that do not exist today, I will also be correct. You said it in a silly way but you're directionally correct.
No. You state that this is all that it would take to be considered as tremendous business value. You are moving your goal posts on your point. My point is that you are taking an absolute position that there is tremendous business value in its current form(as a miniscule improvement and one insignificant new use case does does not equate to tremendous business value in itself) and so that remains to be seen.
Rushing to get on board something that looks like it might be the next big thing is often short-sighted. Some recent examples include Windows XP: Tablet Edition and Google Glass.
That's like saying that gambling is shortsighted. It depends entirely on the odds as to whether or not it's wise, but "shortsighted" implies that making the bet precludes some future course of action.
Maybe if you have near-infinite wealth like Google or Microsoft you aren't precluding future choices. For most economic actors, making some bets means not making others.
Companies that are hastily shoehorning AI into their customer support systems could instead devote resources to improving the core product to reduce the need for support.
If google bears no role in fixing the issues it finds and nobody else is being paid to do it either, it functionally is just providing free security vulnerability research for malicious actors because almost nobody can take over or switch off of ffmpeg.
I don’t think vulnerability researchers are having trouble finding exploitable bugs in FFmpeg, so I don’t know how much this actually holds. Much of the cost center of vulnerability research is weaponization and making an exploit reliable against a specific set of targets.
(The argument also seems backwards to me: Google appears to use a lot of not-inexpensive human talent to produce high quality reports to projects, instead of dumping an ASan log and calling it a day. If all they cared about was shoveling labor onto OSS maintainers, they could make things a lot easier for themselves than they currently do!)
Internally, Google maintains their own completely separate FFMpeg fork as well as a hardened sandbox for running that fork. Since they keep pace with releases to receive security fixes, there’s potentially lots of upstreamable work (with some effort on both sides…)
My understanding from adjacent threads in this discussion is that Google does in fact make significant upstream contributions to FFmpeg. Per policy those are often made with personal emails, but multiple people have said that Google’s investment in FFmpeg’s security and codec support have been significant.
(But also, while this is great, it doesn’t make an expectation of a patch with a security report reasonable! Most security reports don’t come with patches.)
Yeah it's more effort, but I'd argue that security through obscurity is a super naive approach. I'm not on Google's side here, but so much infrastructure is "secured" by gatekeeping knowledge.
I don't think you should try to invoke the idea of naivete when you fail to address the unhappy but perfectly simple reality that the ideal option doesn't exist, is a fantasy that isn't actually available, and among the available options, even though none are good, one is worse than another.
"obscurity isn't security" is true enough, as far as it goes, but is just not that far.
And "put the bugs that won't be fixed soon on a billboard" is worse.
The super naive approach is ignoring that and thinking that "fix the bugs" is a thing that exists.
More fantasy. Presumes the bug only exists in some part of ffmpeg that can be disabled at all, and that you don't need, and that you are even in control over your use of ffmpeg in the first place.
Sure, in maybe 1 special lucky case you might be empowered. And in 99 other cases you are subject to a bug without being in the remotest control over it since it's buried away within something you use and don't even have the option not to use the surface service or app let alone control it's subcomponents.
It's a heck of a lot better than being unaware of it.
(To put this in context: I assume that on average a published security vulnerability is known about to at least some malicious actors before it's published. If it's published, it's me finding out about it, not the bad actors suddenly getting a new tool)
it's only better if you can act on it equal to the bad guys. If the bad guys get to act on it before you, or before some other good guys do on your behalf, then no it's not better
remember we're not talking about keeping a bug secret, we're talking about using a power tool to generate a fire hose of bugs and only doing that, not fixing them
"The bug" in question refers to the one found by the bug-finding tool the article claims triggered the latest episode of debate. Nobody is claiming it's the only bug, just that this triggering bug highlighted was a clear example of where there is actually such a clear cut line.
Google does contribute some patches for codecs they actually consume e.g. https://github.com/FFmpeg/FFmpeg/commit/b1febda061955c6f4bfb..., the bug in question was just an example of one the bug finding tool found that they didn't consume - which leads to this conversation.
Given that Google is both the company generating the bug reports and one of the companies using the buggy library, while most of the ffmpeg maintainers presumably aren't using their libraries to run companies with a $3.52 trillion dollar market cap, would you argue that going public with vulnerabilities that affect your own product before you've fixed them is also a naive approach?
Sorry, but this states a lot of assumption as fact to ask a question which only makes sense if it's all true. I feel Google should assist the project more financially given how much they use it, but I don't think Google shipping products using every codec they find bugs for with their open source fuzzer project is a reasonable guess. I certainly doubt YouTube/Chrome let's you upload/compiles ffmpeg with this LucasArts format, as an example. For security issues relevant to their usage via Chrome CVEs etc, they seem to contribute on fixes as needed. E.g. here is one via fuzzing or a codec they use and work on internally https://github.com/FFmpeg/FFmpeg/commit/b1febda061955c6f4bfb...
In regards whether it's a bad idea to publicly document security concerns found regardless whether you plan on fixing them, it often depends if you ask the product manager what they want for their product or what the security concerned folks in general want for every product :).
> it functionally is just providing free security vulnerability research for malicious actors because almost nobody can take over or switch off of ffmpeg
At least, if this information is public, someone can act on it and sandbox ffmpeg for their use case, if they think it's worth it.
I personally prefer to have this information be accessible to all users.
This is a weird argument. Basically condoning security through obscurity: If nobody reports the bug then we just pretend it doesn’t exist, right?
There are many groups searching for security vulnerabilities in popular open source software who deliberately do not disclose them. They do this to save them for their own use or even to sell them to bad actors.
It’s starting to feel silly to demonize Google for doing security research at this point.
The timeline is industry standard at this point. The point is make sure folks take security more seriously. If you start deviating from the script, others will expect the same exceptions and it would lose that ability. Sometimes it's good to let something fail loudly to show this is a problem. If ffmpeg doesn't have enough maintainers, then they should fail and let downstream customers know so they have more pressure to contribute resources. Playing superman and trying to prevent them from seeing the problem will just lead to burn out.
Is it industry standard to run automatic AI tools and spam the upstream with bug reports? To then expect the bugs to be fixed within a 90 days is a bit much.
It's not some lone report of an important bug, it's AI spam that put forth security issues at a speed greater than they have resources to fix it.
I guess the question that a person at Google who discovers a bug they don’t personally have time to fix is, should they report the bug at all? They don’t necessarily know if someone else will be able to pick it up. So the current “always report” rule makes sense since you don’t have to figure out if someone can fix it.
The same question applies if they have time to fix it in six months, since that presumably still gives attackers a large window of time.
In this case the bug was so obscure it’s kind of silly.
It’s possible that this is a more efficient use of their time when it comes to open source security as a whole, most projects do not have a problem with reports like this.
If not pumping out patches allows them to get more security issues fixed, that’s fine!
From the perspective of Google maybe, but from the perspective of open source projects, how much does this drain them?
Making open source code more secure and at the same time less prevalent seems like a net loss for society. And if those researchers could spare some time to write patches for open source projects, that might benefit society more than dropping disclosure deadlines on volunteers.
Except users can act accordingly to work around the vulnerability.
For one, it lets people understand where ffmpeg is at so they can treat it more carefully (e.g. run it in a sandbox).
Ffmpeg is also open source. After public disclosure, distros can choose to turn off said codec downstream to not expose this attack vector. There are a lot of things users can do to protect themselves but they need to be aware of the problem first.
This is comical because we used to have something called the turing test which we considered our test of human-level intelligence. We never talk about it now because we obviously blew past it years ago.
There are some interesting ways in which AI remains inferior to human intelligence but it is also obviously already superior in many ways already.
It remains remarkable to me how common denial is when it comes to what AI can or cannot actually do.
There are also some interesting ways in which bicycles remain inferior to human locomotion but they are also obviously already superior in many ways already.
Still doesn't mean we should gamble the economies of whole continents on bike factories.
But common patterns of LLMs today will become adopted by humans as we are influenced linguistically by our interactions - which then makes it harder to detect LLM output.
I think it's that the issues are still so prevalent that people will justify poor arguments and reasons for being skeptical, because it matches their feelings, and articulating the actual problem is harder.
It's exactly the same as the literal Luddites, synthesizers, cameras, etc. The actual concern is economic: people don't want to be replaced.
But the arguments are couched in moral or quality terms for sympathy. Machine-knitted textiles are inferior to hand-made textiles. Synthesizers are inferior to live orchestras. Daguerreotypes are inferior to hand-painted portraits.
It's a form of intellectual insincerity, but it happens predictably with every major technological advance because people are scared.
I think it would ease some of my concerns, but wouldn't make me in the camp that believes it should be raced to without thinking about how to control it and plans in place to both identify and react to it's risks.
There are two doomsdays. The dramatic one where they control the military and we end up living in the matrix. And the less dramatic, where we as human forget how to do things for ourselves and then slowly watch the AIs become less and less capable of keeping us happy and alive. Maybe in the end of both scenarios it's similar but one would take decades, while the other could happen overnight.
Accuracy alone doesn't fix either doomsday scenario. But it would slow some of the issues I see forming already with people replacing research skills and informational reporting with AIs that can lie or be very misleading.
> We never talk about it now because we obviously blew past it years ago.
It's shocking to me that (as far as I know) no one has actually bothered to do a real Turing test with the best and newest LLMs. The Turing test is not whether a casual user can be momentarily confused about whether they are talking to a real person, or if a model can generate real-looking pieces of text. It's about a person seriously trying, for a fair amount of time, to distinguish between a chat they are having with another real person and an AI.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1.
It is your move. What do you play?
A: (After a pause of 15 seconds) R-R8 mate.
Tbf, a machine is more likely to be versed in this ancient descriptive notation than a human is who is maybe just playing casually. R1 and K1 have not been around since the 80s.
Try reading Turing's thesis before making that assertion, because the imitation game wasn't meant to measure a tipping point of any kind.
It's just a thought experiment to show that machines achieving human capabilities isn't proof that machines "think", then he argues against multiple interpretations of what machines "thinking" does even mean, to conclude that whether machines think or not is not worth discussing and their capabilities are what matters.
That is, the test has nothing to do with whether machines can reach human capabilities in the first place. Turing took for granted they eventually would.
> We never talk about it now because we obviously blew past it years ago.
My Turing test has been the same since about when I learned it existed. I told myself I'd always use the same one.
What I do is after saying Hi, I will repeat the same sentence forever.
A human still reacts very differently than any machine to this test. Current AIs could be adversarially prompted to bypass this maybe, but so far it's still obvious its a machine replying.
"Hi there! I understand you're planning to repeat the same sentence. I'm here whenever you'd like to have a conversation about something else or if you change your mind. Feel free to share whatever's on your mind!"
I don't think I've ever imagined a human saying "I understand you're planning to repeat the same sentence", if you thought this was some kind of killer rebuke, I don't think it worked out the way you imagined- do you actually think that's a human-sounding response? To me it's got that same telltale sycophancy of a robot butler that I've come to expect from these consumer grade LLMs.
That's mostly because of the system prompt asking Claude to be a helpful assistant.
If you try with a human who works in a call center with that system prompt as instructions on how to answer calls, you will likely get a similar response.
But honestly, believe in whatever you wanna believe. I'm so sick of arguing with people online. Not gonna waste my time here anymore.
Maybe don't take such a maximalist interpretation of other people's comments, my point that it doesn't pass that test doesn’t mean it isn't extremely useful for many things. It's just that the test is undefined so I find it funny people say they truly cannot tell it's not a real person. I could've been more crass and said it also doesn’t reply to insults like a real person. There's so many ways in which it doesn't behave like a human, but it's still pretty useful.
What I read from your reply is that you adjoin the above statement with "and therefore they are useless" but there's no need to read it like that.
> This is comical because we used to have something called the turing test
It didn't go anywhere.
> which we considered our test of human-level intelligence.
No, this is a strawman. Turing explicitly posits that the question "can machines think?" is ill-posed in the first place, and proposes the "imitation game" as something that can be studied meaningfully — without ascribing to it the sort of meaning commonly described in these arguments.
More precisely:
> The original question, "Can machines think?" I believe to be too meaningless to deserve discussion. Nevertheless I believe that at the end of the century the use of words and general educated opinion will have altered so much that one will be able to speak of machines thinking without expecting to be contradicted.
----
> We never talk about it now because we obviously blew past it years ago.
No. We talk about it constantly, because AI proponents keep bringing it up fallaciously. Nothing like "obviously blowing past it years ago" actually happened; cited examples look nothing like the test actually described in Turing's paper. But this is still beside the point
> There are some interesting ways in which AI remains inferior to human intelligence but it is also obviously already superior in many ways already.
Computers were already obviously superior to humans in, for example, arithmetic, decades ago.
> It remains remarkable to me how common denial is when it comes to what AI can or cannot actually do.
It is not "denial" to point out your factual inaccuracies.
yes it does - it has to be meaningful or rigorous for the comparative ranking to be meaningful or rigorous, or else wtf are you doing? Say I have all the information on my side but only these questions that you are showing the user? Who cares about that comparison?
Deletion of data is the most permanent thing most people will ever do. The burning of the library of Alexandria and the razing of Baghdad left a long, long shadow on history.
Most of the time when I see this snark, and look it up, it turns out that the "original" inventor did only the most basic step or vague foundations and never refined it further or explored any potential applications.
Most often it happens with China since they spend a lot of propganda to present themselves as the true inventor of everything.
you jest but it is wild how often people declare “if it was ‘advanced’ and outside of Europe, it was probably aliens. not the people, it was aliens, obviously.”
"AI effect" is long known and pretty well documented.
When AI beat humans at chess, it didn't result in humans revising their idea of the capabilities of machine intelligence upwards. It resulted in humans revising their notion of how much intelligence is required to play chess at world champion level downwards, and by a lot.
Clearly, there's some sort of psychological defense mechanism in play. First, we see "AI could never do X". Then an AI does X, and the sentiment flips to "X has never required any intelligence in the first place".
I think it's fairly safe to say that "X can be modeled as a math problem and does not require _general_ intelligence to solve" for any X that general intelligence can solve. Some math problems are just more complicated than others.
>Norvig is clearly very interested in seeing what Hinton could come up with. But even Norvig didn’t see how you could build a machine that could understand stories using deep learning alone. https://www.newyorker.com/news/news-desk/is-deep-learning-a-...
> Quite a wide variety of people find AI deeply ego threatening to the point of being brainwormed into spouting absolute nonsense, but why?
He is not brainwashed, this just happens to be his business. What happens to Gary Marcus if Gary Marcus stops talking about how LLM are worthless? He just disappears. No one ever interviews him for his general thoughts on ML, or to discuss his (nonexistent) research. His only clame to fame is being the loudest contrarian person in the LLM world so he has to keep doing that or accept to become irrelevant.
Slight tangent but this is a recurring pattern in fringe belief, e.g. prominent flat earther who long ago accepted earth is not flat but can’t stop the act as all their friends and incomes are tied to that belief.
Not to say that believing LLM won’t lead to AGI is fringe, but it does show the danger (and benefits I guess) to tying your entire identity to a specific belief.
> Quite a wide variety of people find AI deeply ego threatening to the point of being brainwormed into spouting absolute nonsense, but why?
It makes sense when you look at this as a wider conversation. Every time Sam Altman, Elon Musk and co. predict that AGI is just around the corner, and their products will be smarter than all of humanity combined, and they are like having an expert in everything in your pocket; people like Gary Marcus are going to respond in just as extreme way in the opposite direction. Maybe if the AI billionaires with the planet-wide megaphones weren't so bombastic about their claims, certain other people wouldn't be so bombastic in their pushback.
Gary Marcus said that Deep Learning was hitting a wall 1 month before the release of DALLE 2, 6 months before the release of ChatGPT and 1 year before GPT4, arguably 3 of the biggest milestones in Deep Learning
There are some base models available to the public today. Not on "end of 2025 frontier run" scale, but a few of them are definitely larger and better than GPT-3. There are some uses for things like that.
Not that the limits of GPT-3 were well understood at the time.
We really had no good grasp of how dangerous or safe something like that would - and whether there are some subtle tipping point that could propel something like GPT-3 all the way to AGI and beyond.
Knowing what we know now? Yeah, they could have released GPT-3 base model and nothing bad would have happened. But they didn't know that back then.
I use self-driving every single day in Boston and I haven’t needed to intervene in about 8 months. Most interventions are due to me wanting to go a different route.
Based on the rate of progress alone I would expect functional vision-only self-driving to be very close. I expect people will continue to say LIDAR is required right up until the moment that Tesla is shipping level 4/5 self-driving.
Same experience in a mix of city/suburban/rural driving, on a HW3 car. Seeing my car drive itself through complex scenarios without intervention, and then reading smart people saying it can’t without hardware it doesn’t have, gives major mental whiplash.
I would like to get my experience more in line with yours. I can go a few miles without intervention, but that's about it, before it does something that will result in damage if I don't take over. I'm envious that some people can go months when I can't go a full day.
Where are you driving?! If the person you're replying to has gone 8 months in Boston without having to intervene, I'm impressed. Boston is the craziest place to drive that I've ever driven.
Pro tip if you get stuck in a warren of tiny little back streets in the area. Latch on to the back of a cab; they're generally on their way to a major road to get their fare where they're going and they usually know a good way to get to one. I've pulled this trick multiple times around city hall, Government Center, the old state house, etc.
Or when. Driving during peak commute hours really makes you a sardine in a box and it's harder for there to be intervene-worthy events just by nature of dense traffic.
> Based on the rate of progress alone I would expect functional vision-only self-driving to be very close.
So close yet so far, which is ironically the problem vision based self-driving has. No concrete information just a guess based on the simplest surface data.
On a scale from "student driver" to "safelite guy (or any other professional who drives around as part of their job) running late" how does it handle storrow and similiar?
Like does it get naively caught in stopped traffic for turns it could lane change out or does it fucking send it?
I don't drive in Boston, but there is some impatience factor and it will make human-like moves out of correct-but-stopped lanes into moving ones. It'll merge into gaps that feel very small when it doesn't have other options.
Is it fair for the company to have received three months of payments from a customer but the salesperson doesn’t get commission at all? How will you retain sales staff when word gets out? What’s the period length over which if the deal dies the salesperson doesn’t get their commission? Do you roll back commission payments later when the customer stops paying?
These are all great questions which people have answered and it’s the standard solution to the problem of misaligned incentives between the company receiving recurring revenue and the sales person receiving an upfront commission.
I spend most of my engineering time these days not on writing code or even thinking about my product, but on Claude Code configuration (which is portable so should another solution arise I can move it). Whenever Claude Code doesn’t oneshot something, that is an opportunity for improvement.