I like open source software. I like what you are doing and keeping development open.
I have been closely watching AI development. There are 10k+ apps now using AI. Every major company FAANG, Tier-2,3,4,5 company now have AI as top priority. However, there got to be something coming out of wrapper software. I have not read docs entirely yet. I have a few questions for you that might give us idea whether this fits our use case.
1. Which models are you using for this? Can I switch models to open source?
2. When you say connect to Apps, how often are you pulling data from these apps? For example, you connect to confluence where tens of wikis get updated. How much of that ends up in your vector DB?
3. Most important, what separates you from tens of other providers out there? Glean, as someone commented, is very similar to what you are doing.
4. How do you plan to convince SMBs and mid-size companies to use you over say in-house development?
5. OpenAI, Mistral, Claude and other LLM model developers can build this functionality natively into their offering. Are you concerned about becoming obselete or losing competitive ground? If not, why?
Either way, this is a good direction. I will try it out tonight. Feel free to respond when you get a chance.
Hello, thanks for the kind words! With regards to your questions:
1. Are you referring to the local NLP models or the LLM? The local models are already open source models or ones we've trained ourselves. If you're talking about the LLM, the default is OpenAI but it's easy to configure other ones without any code changes.
2. Most sources are pulled from every 10 minutes. They have incremental updates so if you have Confluence with a million pages, probably in the last 10 minutes, only a dozen or so have been updated. The only exception is websites (which are crawled recursively so we don't know which pages are updated before we try), which is updated once a day.
3. Glean is indeed similar. Without going into the features in detail, we are an open source Glean with more of an emphasis on LLMs and Chat.
4. There's generally not a great reason to build from scratch if an open source alternative with +75% alignment exists. They can always build on top of us if they want. A lot of teams reach out to us because they were looking to switch from their in house solution to Danswer. Generally though these are larger teams, we haven't seen many SMBs building RAG for their own usage, usually these smaller teams building RAG are looking to productize.
5. Currently there is no cheap and fast way to fine-tune LLMs every time a document is updated. If you want an LLM to remember the document that was just updated you'd have to augment it to at least dozens of similar (but all correct) examples. RAG is still the only viable option. Then there is the problem of security etc. since you can't enforce user roles at the LLM level. So companies that focus on building LLMs don't really compete in this specific space and they don't want to either as they're trying to build AGI. There is more of a threat from teams like Microsoft and Google who are indeed trying to build knowledge assistants for their product lines, but we think there is a world where open source ends up winning against the giants!
how is it "chat over private data" if you are exposing my data to more parties like openai? I thought you were using a stack of self hosted open weight LLMs etc. If I can send it elsewhere, it is not private data.
So private refers to two things here, sorry for any confusion.
When we say "chat over private data" we mean that this data isn't publicly available and no LLMs have this knowledge in their training. Meaning that with our system you can now ask questions about team specific knowledge. For example, you can ask questions like "What features did customer X ask about in our last call". Obviously if you ask ChatGPT this, it will have no idea.
The other part is data privacy when using the system. The software can be plugged into most LLM providers or locally running LLMs. So if your team doesn't trust OpenAI but instead has a relationship with say Azure, or GCP, you can just plug into one of those instead. Alternatively, a lot of users recently have been setting up Danswer with locally running LLMs with tools like Ollama. In that case, you now have a truely airgapped system where no data is ever going outwards.
I have been closely watching AI development. There are 10k+ apps now using AI. Every major company FAANG, Tier-2,3,4,5 company now have AI as top priority. However, there got to be something coming out of wrapper software. I have not read docs entirely yet. I have a few questions for you that might give us idea whether this fits our use case.
1. Which models are you using for this? Can I switch models to open source?
2. When you say connect to Apps, how often are you pulling data from these apps? For example, you connect to confluence where tens of wikis get updated. How much of that ends up in your vector DB?
3. Most important, what separates you from tens of other providers out there? Glean, as someone commented, is very similar to what you are doing.
4. How do you plan to convince SMBs and mid-size companies to use you over say in-house development?
5. OpenAI, Mistral, Claude and other LLM model developers can build this functionality natively into their offering. Are you concerned about becoming obselete or losing competitive ground? If not, why?
Either way, this is a good direction. I will try it out tonight. Feel free to respond when you get a chance.