Hacker News new | past | comments | ask | show | jobs | submit login

My general method for things like this is to:

1. Get a .dot file of the database. Many tools will export this. 2. Open the .dot in a tool I built for the purpose. 3. Select the tables I'm interested in, and export a subset of the .dot file representing just those tables and relationships. 4. Hand that subset .dot file to the LLM and say, "given this schema, write a query -- here's what I want: <rest of the request here>"

That gets the job done 60% of the time. Sometimes when there's an incorrect shortcut relationship resulting in the wrong join I'll have to redirect with something like, "You need to go through <table list> to relate <table X> to <table Y>" That gets my success rate up above 95%. I'm not doing ridiculous queries, but I am doing recursive aggregations successfully.




You are doing a lot of the work a semantic layer would do for you. I wonder if you would have better luck having the LLM talk to a semantic layer instead of directly to the database.


Can you talk more about what an implementation of such a semantic layer would look like?


There are a number of semantic layer tools out there these days. Each has their own unique approach, but essentially it's a meta layer on top of your database that can be used to do things like form queries or provide a consolidated API to your data (which may be in multiple databases).

Some comments on this thread mention popular semantic layer tools like cube.dev. I also made an open source one that I use regularly, though it's currently in I-hope-to-put-more-time-into-this-someday mode. Been busy with an acquisition this year.

https://github.com/totalhack/zillion


Not OP, but a naive guess is it would mean that you'd have your schema defined in an ORM (for example, Prisma). The advantage here is that the LLM gets context on both the schema and how the schema is used throughout the application.


I use Weaviate and let the model create GraphQL queries to take advantage of both the semantic and data layer. Not sure how efficient it is but it's worked for me


Combining this with structured/constrained generation with grammars/pydantic can supercharge this btw


Can you illustrate this a little? Or should I be asking an LLM for advice? :-)


Something like this is what I do also. Is there another way? How are others doing it?


Export DDL of one or more relevant tables, run query to get a random sample of 5 records for each table. Really quick to gather all this and enough context to handle most query writing tasks with some guidance.


Schema + samples. Same thing a skilled person would use.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: