No, you don't need a vector database. You can get OK results by prompting "give me ten search terms that are relevant to this question", then running those searches against a regular full-text search engine and pasting those results back into the LLM as context along with the original question.
You're likely to get better results from vector-based semantic search though, just because it takes you beyond needing exact matches on search terms.
Vector is better for some use cases (open-domain, more conversational data) and term-based search is better for others (closed-domain, more keyword-based).
I've found that internal enterprise projects tend to be very keyword based, and vector search often produces weird, head-scratcher results that users hate - whereas term-based search does a better job of capturing the right terms, if you do the proper synonym/abbreviation expansions.
That said, I use them both, usually with vector search as a fallback after the initial keyword-based RAG pass
The context length is limited, for gpt-3.5 it's 4k tokens, there are other offerings which offer upto 100k(claude). 100k tokens is ~1 book., but it priced steeply for each call. It's often wiser, cheaper to Retrieve the context from your text & Augment your query to the LLM to Generate more contextual answers. That's the reason for the name Retrieval Augmented Generation (RAG).
For Retrieving - you'd need a vector database (for similarity comparison you can use semantic or vector embedding based similarity search)
Minor note: you only need a vector database if you have so many possible inputs that linear retrieval is too slow.
Arguably, for many use cases (e.g. searching through a document with ~200 passages), loading embeddings in memory and running a simple linear search would be fast enough.
Yeah what you mentioned might be true. Currently our understanding on how LLMs really work behind the screens is limited. For example, there was a recent research[1] where LLM's accuracy is better if the context is added at the beginning when compared to the end of the prompt. So it's mostly by trial & error to figure out what works out best for you. You can use FAISS or similar to have the embeddings in-memory instead of a full fledged vector DB. But pg vector is convenient plugin if you already have postgres instance running
Zilliz just published an article comparing QPS (queries per second) with pg vector vs. Milvus. The results are clear - Milvus, a database designed ground-up for handling vector indexes, outperformed in terms of speed and latency. Dive into the details here. https://zilliz.com/blog/getting-started-pgvector-guide-devel...
Full disclosure, I just joined Zilliz this week as a Dev Advocate.
What I mentioned doesn't depend on how LLMs work, the end result is the same (retrieving useful inputs to pass to your LLM).
Just meant that a lot of people can just do this in-memory or in ad-hoc ways if they're not too latency constrained.
I think unless you need a vector db definitely don't use it.
A vector storage could help in reduce the time it takes to retrieve the most similar hit. I used faiss as a local vector store quite a bit to retrieve vectors fast. Though I had 1.5 million vectors to work through.
Interesting. I thought anything >1million would need a vector db to scale on production. What was your machine config for running faiss? Also did you plan for redundancy or was it just faiss as a service VM?
Others have chimed in as well, but I'll mention that we've been live with our product, for all users, for several months now doing RAG with OpenAI vector embeddings stored in Redis.
We then just fetch up to the vectors related to a customer's schema in memory (largest is ~200MB) and run cosine similarity in a few ms in Go (handwritten, ~25 lines of code), and then we've got out top N things to place in our prompt.
Primitive? You betcha. Works extremely well for our entire customer base? Yup. You definitely don't need a Vector DB unless you have an enormous amount of vectors. For us it means having to run our own Redis clusters, but we know how to do that, and so we don't need to involve another vendor.
RAG is a very useful flow but I agree the complexity is often overwhelming, esp as you move from a toy example to a real production deployment. It's not just choosing a vector DB (last time I checked there were about 50), managing it, deciding on how to chunk data, etc. You also need to ensure your retrieval pipeline is accurate and fast, ensuring data is secure and private, and manage the whole thing as it scales. That's one of the main benefits of using Vectara (https://vectara.com; FD: I work there) - it's a GenAI platform that abstracts all this complexity away, and you can focus on building your application.
You need a vector db because all the vector db companies need customers...
You definitely do need information retrieval. It just shouldn't be limited to vector dbs. Unfortunately vector db companies and the VCs that back them have flooded the internet with propaganda suggesting vector db is the only choice.
https://colinharman.substack.com/p/beware-tunnel-vision-in-a...
For most serious use cases, you'll have far too much data to fit into 1 (or several) inference contexts.
Petroni 2020 got pretty far with TFDIF iirc, for a related but slightly different task, still I’ve got to believe the semantic search element provided by vector DBS is going to add a lot