I believe you're misunderstanding what the OP means about "long-term" memory. From what I can tell, it's not actively modifying the weights of the underlying model, it just "remembers" things from a high number of tokens into the past of its context. The point is that this allows it to remember something it read ~200 pages ago in a very long context window, not that it can remember something from one session into another clean session.