More

xcodevn · 2026-04-07T15:30:29 1775575829

I have a strong feeling that this website was designed by Claude Code using the /frontend-design skill.

alienbaby · 2026-04-07T16:56:54 1775581014

Ok? Not bad to be able to throw something like that together with minimal effort. Works nicely enough.

vman81 · 2026-04-08T12:38:35 1775651915

It's pretty, but almost exclusively focused on ATI/AMD and Nvidia with an intel thrown in for fun. Not actually a list of important GPUs.

xcodevn · 2026-04-04T16:02:08 1775318528

Example run: https://asciinema.org/a/udK1O06TS8f7kXNO

xcodevn · 2026-04-04T15:55:16 1775318116

A Claude Code setup that implements ML papers from arxiv. Give it a paper, it orchestrates a team of AI agents to read the paper, plan the implementation, write the code, verify correctness, optimize performance, train, and compare results against the paper's claims.

xcodevn · 2026-02-15T18:30:12 1771180212

The official document from Anthropic:

> Fast mode is not a different model. It uses the same Opus 4.6 with a different API configuration that prioritizes speed over cost efficiency. You get identical quality and capabilities, just faster responses.

xcodevn · 2026-02-15T12:47:19 1771159639

They failed to grasp the very fundamental point of batching, which is sharing model weights between requests. For more context, this wasn't just one person's mistake, several AI twitter personalities proposed this 'Claude Opus fast = small batching' hypothesis. What I find funny is how confident these AI influencers were, while the people who actually work on LLM serving at frontier labs said nothing. The people who genuinely understand this and work at frontier labs stay quiet. The rest is simply noise.

throwdbaaway · 2026-02-16T03:13:21 1771211601

If you ask someone knowledgeable at r/LocalLLaMA about an inference configuration that can increase TG by *up to* 2.5x, in particularly for a sample prompt that reads "*Refactor* this module to use dependency injection", then the answer is of course speculative decoding.

You don't have to work for a frontier lab to know that. You just have to be GPU poor.

xcodevn · 2026-02-01T10:42:09 1769942529

I do think Claude Code as a tool gave Anthropic some advantages over others. They have plan mode, todolist, askUserQuestion tools, hooks, etc., which greatly extend Opus's capabilities. Agree that others (Codex, Cursor) also quickly copy these features, but this is the nature of the race, and Anthropic has to keep innovating to maintain its edge over others

NitpickLawyer · 2026-02-01T11:24:37 1769945077

The biggest advantage by far is the data they collect along the way. Data that can be bucketed to real devs and signals extracted from this can be top tier. All that data + signals + whatever else they cook can be re-added in the training corpus and the models re-trained / version++ on the new set. Rinse and repeat.

(this is also why all the labs, including some chinese ones, are subsidising / metoo-ing coding agents)

leerob · 2026-02-01T17:06:08 1769965568

(I work at Cursor) We have all these! Plan mode with a GUI + ability to edit plans inline. Todos. A tool for asking the user questions, which will be automatically called or you can manually ask for it. Hooks. And you can use Opus or any other models with these.

xcodevn · 2026-02-01T10:04:08 1769940248

I did something similar in Python, in case people want to see a slightly different perspective (I was aiming for a minimal agent library with built-in tools, similar to the Claude Agent SDK):

https://github.com/NTT123/nano-agent

xcodevn · 2026-01-26T18:46:23 1769453183

> we may have AI that is more capable than everyone in only 1-2 years

There's no evidence this will be the case...

mordymoop · 2026-01-26T20:23:30 1769459010

What would you consider such evidence to look like?

xcodevn · 2026-01-27T01:36:31 1769477791

For one, these models should be able to understand the physical world via images, audio, and video. I do agree that current models are quite good at coding, but that's mainly because coding is entirely text-based and easily verifiable. It's not obvious that this capability will transfer to other domains that aren't text-based and aren't as easily verifiable.

tom_ · 2026-01-27T00:43:36 1769474616

Well for starters the calendar year would have to be 2027 CE at the very earliest.

pineaux · 2026-01-26T19:56:13 1769457373

dont forget who is writing it and what he needs to think about it and what he wants others to think about it...

mrdependable · 2026-01-26T20:09:45 1769458185

I'm starting to think people who build these models are more likely to suffer from AI psychosis.

ares623 · 2026-01-26T20:59:41 1769461181

"The worst person you know is being told 'You are absolutely right!' by an LLM right now"

xcodevn · 2026-01-26T16:46:35 1769445995

I'm not familiar with these open-source models. My bias is that they're heavily benchmaxxing and not really helpful in practice. Can someone with a lot of experience using these, as well as Claude Opus 4.5 or Codex 5.2 models, confirm whether they're actually on the same level? Or are they not that useful in practice?

P.S. I realize Qwen3-Max-Thinking isn't actually an open-weight model (only accessible via API), but I'm still curious how it compares.

miroljub · 2026-01-26T16:48:06 1769446086

I don't know where your impression about benchmaxxing comes from. Why would you assume closed models are not benchmaxxing? Being closed and commercial, they have more incentive to fake it than the open models.

segmondy · 2026-01-26T16:52:40 1769446360

You are not familiar, yet you claim a bias. Bias based on what? I use pretty much just open-source models for the last 2 years. I occasionally give OpenAI and Anthropic a try to see how good they are. But I stopped supporting them when they started calling for regulation of open models. I haven't seen folks get ahead of me with closed models. I'm keeping up just fine with these free open models.

orangebread · 2026-01-26T16:51:30 1769446290

I haven't used qwen3 max yet, but my gut feeling is that they are benchmaxxing. If I were to rate the open models worth using by rank it'd be:

- Minimax

- GLM

- Deepseek

segmondy · 2026-01-26T16:53:47 1769446427

Your ranking is way off, Deepseek crushes Minimax and GLM. It's not even a competition.

orangebread · 2026-01-26T17:37:24 1769449044

Yeah, I get there's nuance between all of them. I ranked Minimax higher for its agentic capabilities. In my own usage, Minimax's tool calling is stronger than Deepseek's and GLM.

xcodevn · 2026-01-26T15:42:11 1769442131

My observation is that vibe-coded applications are significantly lower quality than traditional software. Anthropic software (which they claim to be 90% vibe coded) is extremely buggy, especially the UI.

gowld · 2026-01-26T15:59:27 1769443167

That's a misunderstanding based on loose definition of "vibe coding". When companies threw around the "90% of code is written by AI" claims, they were referring to counting characers of autocomplete basing on users actually typing code (most of which was eequivalent to "AI generated" code by Eclipse tab-completion decade ago), and sometimes writing hyperlocal prompts for a single method.

We can identify 3 levels of "vibe coding":

1. GenAI Autocomplete

2. Hyperlocal prompting about a specific function. (Copilot's orginal pitch)

3. Developing the app without looking at code.

Level 3 is hardly considered "vibe" coding, and Level 2 is iffy.

"90% of code written by AI" in some non-trivial contexts only very recently reached level 3.

I don't think it ever reached Level 2, because that's just a painfully tedious way of writing code.

xcodevn · 2026-01-26T16:16:17 1769444177

I believe Anthropic is already doing Level 3 vibe coding for >90% of their code.

doug_durham · 2026-01-26T16:24:04 1769444644

They have not said that. They've only said that most of their code is written by Claude. That is different than "vibe coding". If competent engineers review the code then it is little different than any coding.

xcodevn · 2026-01-26T16:36:53 1769445413

IIRC, the Claude Code creator mentioned that all the PRs are reviewed by humans, just like normal human PRs. So yes, humans still look at the code at the review stage. Though I still consider this to be level 3, but anyway, this is just a matter of definition.

andai · 2026-01-26T16:02:58 1769443378

I mostly work at level 2, and I call it "power coding", like power armor, or power tools. Your will and your hand still guides the process continuously. But now your force is greatly multiplied.