heavymemory's comments

heavymemory · 2025-12-08T12:10:25 1765195825

This still seems like gradient descent wrapped in new terminology. If all learning happens through weight updates, its just rearranging where the forgetting happens

heavymemory · 2025-12-08T12:03:14 1765195394

The idea is interesting, but I still don’t understand how this is supposed to solve continual learning in practice.

You’ve got a frozen transformer and a second module still trained with SGD, so how exactly does that solve forgetting instead of just relocating it?

heavymemory · 2025-12-08T11:59:40 1765195180

Do you have a source for the NVIDIA “diffusion plus autoregression 6x faster” claim? I can’t find anything credible on that.

Bombthecat · 2025-12-08T12:18:01 1765196281

Ha! Found it: https://arxiv.org/abs/2511.08923

Thanks to AI search :)

Bombthecat · 2025-12-08T12:10:21 1765195821

Me neither, that's why I wrote that someone claimed that they did.

The idea is simple, in a way, with diffusion several sentences / words get predicted, but they usually are not of great quality. With auto regression they select the correct words.

Increasing quality and speed. Sounds a bit like conscious and sub-conscious to me.

heavymemory · 2025-12-06T15:47:43 1765036063

This started as a small experiment in structural rewriting. The basic idea is that you give the system two or three before and after examples, press Teach, and it learns that transformation and applies it to new inputs.

It is not an LLM and it is not a template system. There is a small learned component that decides when a rule applies, and the rest is a deterministic structural rewrite engine.

There are a few demo modes:

TEACH: learn a structural rule from examples COMPOSE: apply several learnt rules together TRANSFER: use the same rule in different symbolic domains SIMPLIFY: multi step rewriting CODEMOD: for example you can teach lodash.get to optional chaining from two examples

Once a rule is learnt it generalises to inputs that are deeper or shaped differently from the examples you gave. Everything runs on CPU and learning happens in real time.

Demo: https://re.heavyweather.io

heavymemory · 2025-12-06T04:15:19 1764994519

If anyone saw odd behaviour just now in the demo, that was my fault. One of the codemod rules was leaking into the shared rule registry instead of being scoped to the current user. I have isolated that and it should be fixed.

The core engine was not affected. The issue was simply that a user taught rule was visible to other demo modes, which made it fire that rule everywhere.

If anyone notices anything else strange, let me know. It should behave normally now.

heavymemory · 2025-12-06T03:38:54 1764992334

Right, associativity is the simplest case because the structure is visible directly in one example.

The system needs multiple examples when there is more than one varying part and a single example is ambiguous. A simple example is wrapping a function call. With:

    doThing(x) → log(doThing(x))
    process(y) → log(process(y))

the system learns that: the function name varies the argument varies he outer log(…) is constant

From that it infers the general rule and applies it to new inputs. A single example would not be enough to disambiguate that pattern.

heavymemory · 2025-11-30T21:13:13 1764537193

Thanks, i'll try and publish something soon.

heavymemory · 2025-11-29T15:54:30 1764431670

6th time in the last year that this was posted, apparently

heavymemory · 2025-11-29T15:25:08 1764429908

I think primarily in structures, spaces, and transformations. Language tags along afterward.

heavymemory · 2025-11-29T15:24:06 1764429846

If thought needed words, you’d be unable to think of anything you can’t yet describe