More

lairv · 2025-09-20T19:42:48 1758397368

If you refresh you get simpler ones, like the couple kissing

lairv · 2025-09-17T12:57:53 1758113873

We know that they correctly implement their specification*

meithecatte · 2025-09-17T19:06:07 1758135967

No, they are correct, because the deciders themselves are just a cog in the proof of the overall theorem. The specification of the deciders is not part of the TCB, so to speak.

lairv · 2025-09-10T22:34:42 1757543682

on the contrary this makes me bullish about their team, it shows that people here care about the craft

sudohalt · 2025-09-11T05:29:26 1757568566

The team is good, and I enjoyed the read. But this is just an engineering blog post. They're promoting this like it's ground breaking research and it's on their front-page. Ultimately this paper is not very meaningful and just a fun debugging session.

I've seen this play out dozens of times. So many startups that have come and go in the bay area were composed of extremely talented individuals, but almost all of them failed.

lairv · 2025-09-07T18:50:46 1757271046

> I don't remember the last time I was surprised by a car approaching from behind.

With a headwind I often don’t hear cars behind me at all, so I can see the use case

lairv · 2025-09-06T01:08:17 1757120897

I tried to statically link DuckDB to one of my C++ project earlier this year and it took me 3 days to have something working on Windows/Linux/MacOS (just to be able to use the dependency)

While I'm not a C++ expert, doing the same in Python is just one pip install away, so yeah both "richness" and "ease of use" of the ecosystem matters

j2kun · 2025-09-12T17:55:34 1757699734

You made my point for me: DuckDB is already in C++ (so C++ is rich enough), but hard to glue with other pieces, and you don't want to put in that effort, so you use Python.

lairv · 2025-09-06T00:52:44 1757119964

It's the backend for torch.compile, pytorch eager mode will still use cuBLAS/cuDNN/custom CUDA kernels, not sure what's the usage of torch.compile

almostgotcaught · 2025-09-06T02:10:06 1757124606

> not sure what's the usage of torch.compile

consider that at minimum both FB and OAI themselves definitely make heavy use of the Triton backend in PyTorch.

lairv · 2025-08-11T13:56:34 1754920594

TBH most people I know who regularly drive there still take the Millau valley route, since the viaduct toll is quite expensive at 13€ in the summer (just to cross the bridge)

Sammi · 2025-08-11T14:24:19 1754922259

Doing a bit of googling it seems people report saving anything from 20 min to 1 hour by taking the bridge. But during some particular holidays, where there is lots of traffic, the saving can become 4 hours.

lucianbr · 2025-08-11T14:54:05 1754924045

I suppose the 4 hours saving comes from a lot of people being on the non-bridge route, meaning a lot of people choose to not take the bridge. Is there any other possible reason for the 4 hours saved?

bobthepanda · 2025-08-11T15:49:07 1754927347

It's a substantially flatter, straighter line, and much higher capacity. The valley route is only a single lane in each direction with no grade separation at intersections and you are comparing that to a four lane freeway.

lucianbr · 2025-08-13T12:24:29 1755087869

> people report saving anything from 20 min to 1 hour by taking the bridge. But during some particular holidays, where there is lots of traffic, the saving can become 4 hours

You thing during particular holidays the single lane somehow has even less lanes, less grade separation and such? That would be quite a phenomenon.

bobthepanda · 2025-08-13T20:41:21 1755117681

All those things get saturated much more quickly during high traffic times, whereas the freeway has substantially higher capacity to work with.

In particular most intersections on the now D809 are roundabouts, continuing on the D809 often requires making a turn on the roundabout, and roundabouts are notorious for gridlocking with high turn volumes. Let that gridlock cascade across multiple intersections and you now have rapidly deteriorating travel times.

At other times, traffic is less high so this gridlocking is less likely to occur.

I_complete_me · 2025-08-12T09:33:32 1754991212

I'd say that gephyrophobia is a legitimate one. I mean, I for one would be terrified to have to cross it.

lairv · 2025-08-01T01:27:06 1754011626

> if you can’t answer those questions, if you don’t even find them interesting

I think you missed the second part of the sentence, it's one thing to not know the answers, but you should have a personal interest in finding them

lairv · 2025-07-17T20:23:24 1752783804

In general most of the previous AI "breakthrough" in the last decade were backed by proper scientific research and ideas:

- AlphaGo/AlphaZero (MCTS)

- OpenAI Five (PPO)

- GPT 1/2/3 (Transformers)

- Dall-e 1/2, Stable Diffusion (CLIP, Diffusion)

- ChatGPT (RLHF)

- SORA (Diffusion Transformers)

"Agents" is a marketing term and isn't backed by anything. There is little data available, so it's hard to have generally capable agents in the sense that LLMs are generally capable

chaos_emergent · 2025-07-17T22:39:19 1752791959

I disagree that there isn't an innovation.

The technology for reasoning models is the ability to do RL on verifiable tasks, with the some (as-of-yet unpublished, but well-known) search over reasoning chains, with a (presumably neural) reasoning fragment proposal machine, and a (presumably neural) scoring machine for those reasoning fragments.

The technology for agents is effectively the same, with some currently-in-R&D way to scale the training architecture for longer-horizon tasks. ChatGPT agent or o3/o4-mini are likely the first published models that take advantage of this research.

It's fairly obvious that this is the direction that all the AI labs are going if you go to SF house parties or listen to AI insiders like Dwarkesh Patel.

lairv · 2025-07-17T23:53:20 1752796400

Fair enough I guess, even though the concept of agent/agentic task popped before reasoning models were really a thing

chaos_emergent · 2025-07-18T01:08:19 1752800899

The idea of chatbots existed before ChatGPT, does that mean it's purely marketing hype?

ashwindharne · 2025-07-18T13:38:30 1752845910

'Agents' are just a design pattern for applications that leverage recent proper scientific breakthroughs. We now have models that are increasingly capable of reading arbitrary text and outputting valid json/xml. It seems like if we're careful about what text we feed them and what json/xml we ask for, we can get them to string together meaningful workflows and operations.

Obviously, this is working better in some problem spaces than others; seems to mainly depend on how in-distribution the data domain is to the LLM's training set. Choices about context selection and the API surface exposed in function calls seem to have a large effect on how well these models can do useful work as well.

mumbisChungo · 2025-07-17T20:27:26 1752784046

My personal framing of "Agents" is that they're more like software robots than they are an atomic unit of technology. Composed of many individual breakthroughs, but ultimately a feat of design and engineering to make them useful for a particular task.

paradite · 2025-07-18T12:48:50 1752842930

Agents have been a field in AI long since 1990s.

MDP, Q learning, TD, RL, PPO are basically all about agent.

What we have today is still very much the same field as it was.

lossolo · 2025-07-17T21:09:25 1752786565

Yep. Agents are only powered by clever use of training data, nothing more. There hasn't been a real breakthrough in a long time.

sothatsit · 2025-07-17T23:28:35 1752794915

"Long time" as in, 7 months since o1 and reasoning models were released? That was a pretty big breakthrough.

lossolo · 2025-07-17T23:41:04 1752795664

In the context of our conversation and what OP wrote, there has been no breakthrough since around 2018. What you're seeing is the harvesting of all low-hanging fruit from a tree that was discovered years ago. But fruit is almost gone. All top models perform at almost the same level. All the "agents" and "reasoning models" are just products of training data.

I wrote more about it here:

https://news.ycombinator.com/item?id=44426993

You may also be interested in this article, that goes into details even more:

https://blog.jxmo.io/p/there-are-no-new-ideas-in-ai-only

sothatsit · 2025-07-18T00:48:26 1752799706

This "all breakthroughs are old" argument is very unsatisfying. It reminds me of when people would describe LLMs as being "just big math functions". It is technically correct, but it misses the point.

AI researchers spent years figuring out how to apply RL to LLMs without degrading their general capabilities. That's the breakthrough. Not the existence of RL, but making it work for LLMs specifically. Saying "it's just RL, we've known about that for ages" does not acknowledge the work that went into this.

Similarly, using the fact that new breakthroughs look like old research ideas is not particularly good evidence that we are going to head into a winter. First, what are the limits of RL, really? Will we just get models that are highly performant at narrow tasks? Or will the skills we train LLMs for generalise? What's the limit? This is still an open question. RL for narrow domains like Chess yielded superhuman results, and I am interested to see how far we will get with it for LLMs.

This also ignores active research that has been yielding great results, such as AlphaEvolve. This isn't a new idea either, but does that really matter? They figured out how to apply evolutionary algorithms with LLMs to improve code. So, there's another idea to add to your list of old ideas. What's to say there aren't more old ideas that will pop up when people figure out how to apply them?

Maybe we will add a search layer with MCTS on top of LLMs to allow progress on really large math problems by breaking them down into a graph of sub-problems. That wouldn't be a new idea either. Or we'll figure out how to train better reranking algorithms to sort our training data, to get better performance. That wouldn't be new either! Or we'll just develop more and better tools for LLMs to call. There's going to be a limit at some point, but I am not convinced by your argument that we have reached peak LLM.

lossolo · 2025-07-18T04:37:52 1752813472

I understand your argument. The recipe that finally let RLHF + SFT work without strip mining base knowledge was real R&D, and GPT 4 class models wouldn’t feel so "chatty but competent" without it. I just still see ceiling effects that make the whole effort look more like climbing a very tall tree than building a Saturn V.

GPT 4.1 is marketed as a "major improvement" but under the hood it’s still the KL-regularised PPO loop OpenAI first stabilized in 2022 only with a longer context window and a lot more GPUs for reward model inference.

They retired GPT 4.5 after five months and told developers to fall back to 4.1. The public story is "cost to serve” not breakthroughs left on the table. When you sunset your latest flagship because the economics don’t close, that’s not a moon shot trajectory, it’s weight shaving on a treehouse.

Stanford’s 2025 AI-Index shows that model to model spreads on MMLU, HumanEval, and GSM8K have collapsed to low single digits, performance curves are flattening exactly where compute curves are exploding. A fresh MIT-CSAIL paper modelling "Bayes slowdown" makes the same point mathematically: every extra order of magnitude of FLOPs is buying less accuracy than the one before.[1]

A survey published last week[2] catalogs the 2025 state of RLHF/RLAIF: reward hacking, preference data scarcity, and training instability remain open problems, just mitigated by ever heavier regularisation and bigger human in the loop funnels. If our alignment patch still needs a small army of labelers and a KL muzzle to keep the model from self lobotomising calling it "solved" feels optimistic.

Scale, fancy sampling tricks, and patched up RL got us to the leafy top so chatbots that can code and debate decently. But the same reports above show the branches bending under compute cost, data saturation, and alignment tax. Until we swap out the propulsion system so new architectures, richer memory, or learning paradigms that add information instead of reweighting it we’re in danger of planting a flag on a treetop and mistaking it for Mare Tranquillitatis.

Happy to climb higher together friend but I’m still packing a parachute, not a space suit.

1. https://arxiv.org/html/2507.07931v1

2. https://arxiv.org/html/2507.04136v1

lairv · 2025-06-28T01:30:46 1751074246

In ocaml you would rather do something like this: let x_power_8 = (let a = x*x in let b = a*a in b*b);

a, b variables are just used for computing x_power_8, you don't need them outside of this scope. I think the point of the exercise is to use variable binding, though I agree the website doesn't explain much