Sumble has been my critical tool to research the organization structure and responsibility in a large company, technology adoption like which organization has the LLM adoption.
A lot of infrastructure work is needed to make the SQL experience seamless work for unstructured data. And at the most part we do fork the open core data warehouse and build on top of it.
There is a big UI part here, because for multimodal data analytics, we think it's crucial for people to see and hear data.
For the RAG search, many DBs have built-in vector search, but chunking, indexing, and maintaining the index are kind of on your own. This may not be a problem for technical people, but it's a hassle for data people who own hundreds of data products within a company. Therefore, we have a semantic search index builder that allows one to build an auto-refreshing semantic search index with no code, and completely keep hands free from coming up with their own vectors.
In addition, data analysis often needs to interrogate the search results further. For example, let's say we have used pgvector to find all the photos related to the Golden Gate Bridge. But then we want to interrogate questions like which of these images has someone wearing a blue shirt. We have to apply another model, and that is outside of a normal DB's responsibility.
I guess to add to Jason's point, it depends on how data engineers/data analysts are perceived in their roles within the company. For some companies, we see a data analyst taking end-to-end responsibility from the data engineering to BI, but for others we also see a clear separation, data engineers doing data pipelining and data modeling, but data analysts are, in fact, business analysts.
Regardless, we think that SQL is the common interface for both of the parties, and we're excited to see who will be the power users.
This does not work with Redshift. This is a query engine for unstructured data like documents, images, videos. Those data do not quite fit into Redshift / Bigquery data warehouse.