While I'm 100% on board with RAG using associative memory, I'm not sure you need Neo4J. Associative recall is generally going to be one level deep, and you're doing a top K cut so even if it wasn't the second order associations are probably not going to make the relevance cut. This could be done relationally, and then if you're using pg_vector you could retrieve all your rag contents in one query.
I think there's a lot of cases where you don't want to just RAG it. If you're going for tool assisted, it's pretty neat to have agent write out queries for what it needs against the knowledge graph. There was an article recently about how LLMs are bad at inferring B is A from A is B. You can also do more precise math against it, which is useful for questions even people need to reason out.
I need to dig into what they're doing here more with their approach, but I think using an LLM for both producing and consuming a knowledge graph is pretty nifty, which I wrote up about a year ago here, https://friend.computer/jekyll/update/2023/04/30/wikidata-ll... .
I will say figuring out how to actually add that conversation properly into a large knowledge graph is a bit tricky. ML does seem slightly better at producing an ontology than humans though (look how many times we've had to revise scientific names for creatures or book ordering)
Yes, but this doesn’t seem to be an actual knowledge graph which is part of the issue imho. If you look at the Microsoft knowledge graph paper linked in the repo it looks like they build out a real entity-relationship based knowledge graph rather then storing responses and surface form text directly.
I think it's relatively unlikely that having an agent write graph queries will outperform vector search against graph information outputted into text and then transformed into vectors.
The related issue that I think is being conflated in this thread is that even if your goal was to directly support graph queries, you could accomplish this with a vanilla database much easier than running a specialized graph db
Outperform in what way? There's some distinct things it already does better on like multi-hop and aggregate reasoning than a similarity context window dump. In general, tool-assisted, of which KG querying is one tool, does pretty good on the benchmarks and many of the LLM chats cutting over to it as the default.
> if your goal was to directly support graph queries, you could accomplish this with a vanilla database much easier than running a specialized graph db
Postgres and MySQL do have pretty reasonable graph query extensions/features. If by easier, you mean effort to get up a MVP, I'd agree, but I'm a bit more dubious on the scale up, as you'd probably get something like Facebook and Tao.
So your solution would be to fine tune the LLM with new knowledge? How do you make sure it preserves all facts and connections/relations and how can you verify during runtime it actually did, and didn't introduce false memories/connections in the process?
I think RAG has a lot to say here. New content / facts go through the embedding process and are then available for query.
I don't generally disagree that a more discrete (not continuous) knowledge base will be another component to augment ai systems. The harder part is how do you build this? (Curate, clean, ETL, query) Not sure a graphdb is the best first choice. Relational DBs can take you pretty far and it is unclear how many 1+N or multi-hop queries you'll need in a robust ai / agent system
I think you are misunderstanding. An embedding places a piece of knowledge in N dimensional space. By using vector distance search you are already getting conceptually similar results.
Semantic difference doesn’t define relationships between semantically disimilar entities in the same way a structured knowledge graph would let you add a new learned relationship. Similarly you can’t necessarily do entity resolution with prurely emebeddings since you’re again just comparing similarity based on the embedding model you’re using rather then the domain or task you’re accomplishing which could differ a lot depending on how generalized the embedding model is vs what you’re doing.
AFAICT most of the "graph" rag implementations discussions, instead of fancy graph queries & or structured knowledge graph, mean:
1. Primary: Inverted index on keywords (= entities). At ingest time, extract entities and reverse index on them. At query time, extract entities and find those related documents, and include next to the vector results as part of the reranking set, or maybe something fancier like a second search based on those.
2. Secondary: Bidrectionally linked summary. At index time, recursively summarize large documents and embed+link the various nested results. At retrieval time, retrieve whatever directly matches, and maybe go up the hierarchy for more.
3. Secondary: Throw everything into the DB - queries, answers, text, chunks - and link them together. As with the others, the retrieval strategy for getting good results generally doesn't leverage this heterogeneous structure and instead end up being pretty simple & direct, e.g., any KV store.
AFAICT, KV stores are really what's being used here to augment the vector search. Scalable text keyword reverse indexing is historically done more on a KV document store like opensearch/elasticsearch, as it doesn't really stress most of the power of a graph engine. Recursive summaries work fine that way too.
Multihop queries and large graph reasoning are cool but aren't really what these are about. Typed knowledge graphs & even fancier reasoning engines (RDF, ...) even less so.
These retrieval tasks are so simple that almost DB can work in theory on them -- SQL, KV, Graph, Log, etc. However, as the size grows, their cost/maintenance/perf etc differences show. We do a lot of graph DB + AI work for our dayjob, so I'm more bullish here on graph long-term, but agreed with others, good to be intellectually honest to make real progress on these.
I will say that sometimes you want a very specific definition of a thing or process to get consistent output. Being able to associatively slurp those up as needed is handy.
> Llamaindex was used to add nodes into the graph store based on documents.
So it sounds like they are generating it based on LLM output rather then user-defined. I also wonder how often you need more than a single hop that graphdbs aim to speed up. In an agent system with self-checking and reranking, you're going to be performing multiple queries anyhow
There is also interesting research around embedding graphs that overlaps with these ideas.
I like your answer and a great example of the limitation of knowledge/semantic graphs. Personally, I'd put a knowledge graph on top of the responses to expose it to LLM as an authority & frame of reference. I think it should be an effective form of protection against hallucination and preventing outright incorrect / harmful outputs in contradiction with the facts known by the graph. At least in my experiments.
Topological relationships vs metric relationships. I suppose a great embedding could handle both, but a graph database might help in the tail, where the quality of the embeddings is weaker?
I agree that something that is more like an SQL query, where you have definitive inclusion, will be useful. The harder question is how do you build something like that? How much AI involvement is there in creating the more discrete relation knowledge base.
I dont know if you need a graphdb in particular but there are likely explicit relationships or entities to resolve to eachother that you’d want to add that aren’t known by a general model about your use case. For example if you are personalizing an assistant maybe you need to represent that “John” in the contacts app is the same as “Jdubs” in Instagram and is this person’s husband.
LLMs have a limited context size, i.e. the chat bot can only recall so much of the conversation. This project is building a knowledge graph of the entire conversation(s), then using that knowledge graph as a RAG database.