I’ve been excited about lancedb and its ability to support vector indexes and efficient row level lookups. I wonder if this approach would work for their design goals and still allow broader backwards compatibility with the parquet ecosystem. Have been intrigued by Ducklake, and they’ve leaned into parquet. Perhaps this approach will allow more flexible indexing approaches with support for the broader parquet ecosystem which is significant.
Marimo is really special and solves most of the problems that you have with Jupyter. For those Marimo curious I strongly recommend checking out their YouTube channel. So much effort gone into making these videos really great. https://youtube.com/@marimo-team?si=ZGaf8Zgq5WN3LKRg
For another library that has great performance and features like full text indexing and the ability to version changes I’d recommend lancedb https://lancedb.github.io/lancedb/
Yes, it’s a vector database and has more complexity. But you can use it without creating indexes and it has excellent polars and pandas zero copy arrow support also.
I’ve been using https://github.com/jasonwhite/rudolfs - which is written in rust. It’s high performance but doesn’t have all the features (auth) that you might need.