Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Using a semantic layer is the cleanest way to have a human in the loop. A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.

With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.

We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.

0 - https://www.definite.app/

1 - https://cube.dev/



Currently exploring cube for a "natural language to SQL" solution.

My schema is - 90+ Tables, 2500+ Columns, well documented

From your experience, does Cube look a fit? My use cases will definitely have JOINS.


yes, that shouldn't be a problem.

with that many tables, you might want to use Views: https://cube.dev/docs/reference/data-model/view


Thanks. sorry, asking more question - Do we need human in the loop with Cube to define the views for all kinds of queries.

In my use case, it's going to be exposed to various kind of stakeholders and there will be versatility of user queries. I can't pre-create views/aggregations for all scenarios.


JSON generation against a semantic layer and validation loops is definitely the easiest way to get high 9s success for going directly to a correct query from text. For the human in the loop cases, going directly to SQL can be fun - I’ve toyed with a SQL-like semantic layer that removes the need for direct table access and joins, which removes two of the risk points for LLMs going off the rails while still leveraging a lot of the baked in knowledge about SQL syntax (window functions, transformations, etc) that can be hard to exhaustively bake into the semantic layer. (It’s annoying when the semantic layer doesn’t quite have what you need.)


duckdb has a `json_serialize_sql` function that we've been messing with. It could be an interesting middle ground. It lets you write SQL as JSON.


Completely agree! A semantic layer is essential for scaling analytics to enterprise complexity.

Another alternative here is Veezoo [0], which combines the semantic layer (a Knowledge Graph) and self-service analytics into one integrated solution.

We built it specifically for the analytics use-case for both the "data persona" to manage the semantic layer, as well as for the "business persona" to analyze the data.

If you’re looking for a semantic-layer + (embedded) BI solution right out of the box. This could be a fit.

0 - https://www.veezoo.com


How well is that working for you?

We use a pattern where we ETL things into tables that model the upstream source closely, then use SQL Views to tighten up the model and integrate across data sources where needed. Keeping this all inside one DB allows us to use tools that understand the schema for autocomplete, etc.

I expect the developer experience would be significantly worse if we started writing views in YAML instead of SQL… but you’ve found the opposite?


Do you care about self-serve analytics (i.e. allowing people that don't know SQL to explore your data)?

A semantic layer is the best way to enable self-serve analytics, but if you don't care about it, it's probably not worth the hassle.

We also use the semantic layer for other nice things like setting goals, column descriptions and other metadata.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: