I've been wondering about how feasible it is to simulate long term memory by running multiple LLMs at the same time. One of them would be tasked with storing and retrieving long term memories from disc, so it'd need to be instructed about some data structure where memories were persisted, and then you'd feed it the current context, instructing it to provide a way to navigate the memory data structure to any potentially relevant memories. Whatever data was retrieved could be injected into the prompt to the next LLM, which would just respond to the given prompt.
No idea what sort of data structure could work. Perhaps a graph database could be feasible, and the memory prompt could instruct it to write a query for the given database.
This is achieved using vector databases to store memories as embeddings. Then you can retrieve a “memory” closest to the question in the embedding space.
This is an active area of research. The best we currently have is vector databases and/or sparse hierarchical information storage (you retrieve a summary of a summary via vector search, find associated summaries via vector search once more, then pluck out the actual data item and add it to the prompt.
No idea what sort of data structure could work. Perhaps a graph database could be feasible, and the memory prompt could instruct it to write a query for the given database.