Using a semantic layer is the cleanest way to have a human in the loop. A human ...

zerop · 2025-03-18T15:40:59 1742312459

Currently exploring cube for a "natural language to SQL" solution.

My schema is - 90+ Tables, 2500+ Columns, well documented

From your experience, does Cube look a fit? My use cases will definitely have JOINS.

mritchie712 · 2025-03-18T18:12:19 1742321539

yes, that shouldn't be a problem.

with that many tables, you might want to use Views: https://cube.dev/docs/reference/data-model/view

zerop · 2025-03-18T18:24:08 1742322248

Thanks. sorry, asking more question - Do we need human in the loop with Cube to define the views for all kinds of queries.

In my use case, it's going to be exposed to various kind of stakeholders and there will be versatility of user queries. I can't pre-create views/aggregations for all scenarios.

efromvt · 2025-03-18T14:09:41 1742306981

JSON generation against a semantic layer and validation loops is definitely the easiest way to get high 9s success for going directly to a correct query from text. For the human in the loop cases, going directly to SQL can be fun - I’ve toyed with a SQL-like semantic layer that removes the need for direct table access and joins, which removes two of the risk points for LLMs going off the rails while still leveraging a lot of the baked in knowledge about SQL syntax (window functions, transformations, etc) that can be hard to exhaustively bake into the semantic layer. (It’s annoying when the semantic layer doesn’t quite have what you need.)

mritchie712 · 2025-03-18T14:31:20 1742308280

duckdb has a `json_serialize_sql` function that we've been messing with. It could be an interesting middle ground. It lets you write SQL as JSON.

tillvz · 2025-03-18T12:55:00 1742302500

Completely agree! A semantic layer is essential for scaling analytics to enterprise complexity.

Another alternative here is Veezoo [0], which combines the semantic layer (a Knowledge Graph) and self-service analytics into one integrated solution.

We built it specifically for the analytics use-case for both the "data persona" to manage the semantic layer, as well as for the "business persona" to analyze the data.

If you’re looking for a semantic-layer + (embedded) BI solution right out of the box. This could be a fit.

0 - https://www.veezoo.com

pbronez · 2025-03-18T15:36:53 1742312213

How well is that working for you?

We use a pattern where we ETL things into tables that model the upstream source closely, then use SQL Views to tighten up the model and integrate across data sources where needed. Keeping this all inside one DB allows us to use tools that understand the schema for autocomplete, etc.

I expect the developer experience would be significantly worse if we started writing views in YAML instead of SQL… but you’ve found the opposite?

mritchie712 · 2025-03-18T18:15:48 1742321748

Do you care about self-serve analytics (i.e. allowing people that don't know SQL to explore your data)?

A semantic layer is the best way to enable self-serve analytics, but if you don't care about it, it's probably not worth the hassle.

We also use the semantic layer for other nice things like setting goals, column descriptions and other metadata.