Can it be easier to do RAG? Do we always need to have Vector DB? Why LLM can't d...

simonw · on Sept 14, 2023

No, you don't need a vector database. You can get OK results by prompting "give me ten search terms that are relevant to this question", then running those searches against a regular full-text search engine and pasting those results back into the LLM as context along with the original question.

You're likely to get better results from vector-based semantic search though, just because it takes you beyond needing exact matches on search terms.

clharman · on Sept 14, 2023

Vector is better for some use cases (open-domain, more conversational data) and term-based search is better for others (closed-domain, more keyword-based).

I've found that internal enterprise projects tend to be very keyword based, and vector search often produces weird, head-scratcher results that users hate - whereas term-based search does a better job of capturing the right terms, if you do the proper synonym/abbreviation expansions.

That said, I use them both, usually with vector search as a fallback after the initial keyword-based RAG pass

chandureddyvari · on Sept 14, 2023

The context length is limited, for gpt-3.5 it's 4k tokens, there are other offerings which offer upto 100k(claude). 100k tokens is ~1 book., but it priced steeply for each call. It's often wiser, cheaper to Retrieve the context from your text & Augment your query to the LLM to Generate more contextual answers. That's the reason for the name Retrieval Augmented Generation (RAG). For Retrieving - you'd need a vector database (for similarity comparison you can use semantic or vector embedding based similarity search)

halflings · on Sept 14, 2023

Minor note: you only need a vector database if you have so many possible inputs that linear retrieval is too slow.

Arguably, for many use cases (e.g. searching through a document with ~200 passages), loading embeddings in memory and running a simple linear search would be fast enough.

chandureddyvari · on Sept 14, 2023

Yeah what you mentioned might be true. Currently our understanding on how LLMs really work behind the screens is limited. For example, there was a recent research[1] where LLM's accuracy is better if the context is added at the beginning when compared to the end of the prompt. So it's mostly by trial & error to figure out what works out best for you. You can use FAISS or similar to have the embeddings in-memory instead of a full fledged vector DB. But pg vector is convenient plugin if you already have postgres instance running

[1]- https://towardsdatascience.com/in-context-learning-approache...

bayesian_limit · on Sept 19, 2023

Zilliz just published an article comparing QPS (queries per second) with pg vector vs. Milvus. The results are clear - Milvus, a database designed ground-up for handling vector indexes, outperformed in terms of speed and latency. Dive into the details here. https://zilliz.com/blog/getting-started-pgvector-guide-devel...

Full disclosure, I just joined Zilliz this week as a Dev Advocate.

halflings · on Sept 14, 2023

What I mentioned doesn't depend on how LLMs work, the end result is the same (retrieving useful inputs to pass to your LLM). Just meant that a lot of people can just do this in-memory or in ad-hoc ways if they're not too latency constrained.

azmodeus · on Sept 14, 2023

I think unless you need a vector db definitely don't use it.

A vector storage could help in reduce the time it takes to retrieve the most similar hit. I used faiss as a local vector store quite a bit to retrieve vectors fast. Though I had 1.5 million vectors to work through.

chandureddyvari · on Sept 14, 2023

Interesting. I thought anything >1million would need a vector db to scale on production. What was your machine config for running faiss? Also did you plan for redundancy or was it just faiss as a service VM?

Tostino · on Sept 14, 2023

People seem to underestimate the scale you can get to on a single machine, and overestimate how easy it will be to go up from there.

An in memory index is about as good as it gets for a single node performance, and fitting that many vectors into memory on a single machine is easy.

phillipcarter · on Sept 14, 2023

Others have chimed in as well, but I'll mention that we've been live with our product, for all users, for several months now doing RAG with OpenAI vector embeddings stored in Redis.

We then just fetch up to the vectors related to a customer's schema in memory (largest is ~200MB) and run cosine similarity in a few ms in Go (handwritten, ~25 lines of code), and then we've got out top N things to place in our prompt.

Primitive? You betcha. Works extremely well for our entire customer base? Yup. You definitely don't need a Vector DB unless you have an enormous amount of vectors. For us it means having to run our own Redis clusters, but we know how to do that, and so we don't need to involve another vendor.

gsuuon · on Sept 14, 2023

For local stuff with a handful of documents, you can even just throw it into a json and call it a day. The similarity search is as simple as an np.dot: https://github.com/gsuuon/llm.nvim/blob/main/python3/store.p...

ofermend · on Sept 14, 2023

RAG is a very useful flow but I agree the complexity is often overwhelming, esp as you move from a toy example to a real production deployment. It's not just choosing a vector DB (last time I checked there were about 50), managing it, deciding on how to chunk data, etc. You also need to ensure your retrieval pipeline is accurate and fast, ensuring data is secure and private, and manage the whole thing as it scales. That's one of the main benefits of using Vectara (https://vectara.com; FD: I work there) - it's a GenAI platform that abstracts all this complexity away, and you can focus on building your application.

clharman · on Sept 14, 2023

You need a vector db because all the vector db companies need customers...

You definitely do need information retrieval. It just shouldn't be limited to vector dbs. Unfortunately vector db companies and the VCs that back them have flooded the internet with propaganda suggesting vector db is the only choice. https://colinharman.substack.com/p/beware-tunnel-vision-in-a...

For most serious use cases, you'll have far too much data to fit into 1 (or several) inference contexts.

petesergeant · on Sept 14, 2023

Petroni 2020 got pretty far with TFDIF iirc, for a related but slightly different task, still I’ve got to believe the semantic search element provided by vector DBS is going to add a lot

potatoman22 · on Sept 14, 2023

You can hook up any search engine to an LLM. Vector databases are just an easy* way to make a decent search engine.