I don't think migrations (at least as done by Django et al.) solve it - you want...

simonw · on April 23, 2023

If you really, really need to be able to see a SQL schema representing the current state, a cheap trick is to run an automation on every deploy that snapshots the schema and writes it to a GitHub repository.

I do a version of that for my own (Django-powered) blog here: https://github.com/simonw/simonwillisonblog-backup/blob/main...

hamandcheese · on April 23, 2023

This comes up a lot, and I'm always surprised that more people don't know about Flyway and "repeatable migrations": https://flywaydb.org/documentation/tutorials/repeatable

> Repeatable migrations are very useful for managing database objects whose definition can then simply be maintained in a single file in version control. Instead of being run just once, they are (re-)applied every time their checksum changes.

Wicher · on April 23, 2023

One of my pet projects addresses these points you raise about Django + migrations, have a look:

https://pypi.org/project/DBSamizdat

In a nutshell it allows you to keep DB functions, triggers and views around as Python classes, so you can version them together with the rest of your application code. The DB state gets updated for you (in the right dependency order) whenever you change them.

It can also run without Django.

adammarples · on April 24, 2023

This is exactly what I would use dbt for

michaelsalim · on April 24, 2023

Had a thought recently. What's stopping someone from separating table migrations from others like functions/triggers?

The standard tables can use the standard process while triggers, etc are run declaratively. When you change something, a tool could simply tear everything down and rebuild it. No need to worry about data since those are handled separately.

Would performance be a concern? Or is there something I'm missing?

hermanradtke · on April 23, 2023

> you want a declarative source of truth for what the schema looks like today, not a chain of changes that only tell you that after computing the combined effect

Applying a series of migrations to get a final db schema is not much different than a version control system like git.

OJFord · on April 23, 2023

Ok, so how do I checkout a version and view the file/schema tree?

If you like, you can view my comment above as saying 'if they included that tooling, not just the similar tree of changes stored, it would be better'.

How to get from one state to the next is interesting to the computer, not to me (not after I've initially written it/done it in prod anyway), I'm interested in the result, where do we stand after all of the migrations (currently on disk having checked out whatever).

simonw · on April 23, 2023

You checkout that version, then run the migrations from scratch against a fresh database.

In Django that's "./mange.py migrate".

OJFord · on April 23, 2023

And then inspect the db, exactly. In other words you can't, there is no declarative 'current state' checked in or provided by the migration tooling.

saltcured · on April 23, 2023

If you are going to use some kind of meta-schema tool to manage migrations, you should store your declarative schema in whatever syntax this tool understands and allow it to generate the schema-mutating migration commands as runtime artifacts (more like a build output than a versioned source).

If not using such a tool, you might adopt a little meta-schema logic in your own DDL. With PostgreSQL, you can try writing idempotent and self-migrating schemas. Using syntax variants like CREATE OR REPLACE for functions and views, DROP ... IF EXISTS, CREATE ... IF NOT EXISTS, and ALTER ... ADD IF NOT EXISTS allows some of your DDL to simply work on an empty DB or a prior version. Wrapping all this within a pl/pgsql block allows conditional statements to run DDL variants. Conditionals are also useful if you need to include some custom logic for data migration alongside your schema migration, i.e. create a new structure, copy+transform data, drop old structure.

For smaller DBs, you may be able to afford some brute-force techniques to simplify the DDL. Things like DROP IF EXISTS for multiple earlier versions of views, indexes, or constraints to clear the slate and then recreating them with the latest definitions. This may add IO costs to execute it, but makes the DDL easier to read as the same DDL statements are used for the clean slate and the migration. Similarly, a little pl/pgsql logic could loop over tables and apply triggers, policies, etc. that you want to use systematically in a project.

If possible, you can also prune your code so that any specific version in your source control only has logic to handle a few cases, i.e. clean slate builds and a migration of N-1 or N-2 to latest. This minimizes the amount of branched logic you might have to understand when maintaining the DDL. The approach here depends a lot on whether you are versioning a product where you control the deployment lifecycle or versioning a product release which may encounter many different prior versions "in the wild".

In any case, you have a more complicated test requirement if you want to validate that your system works from multiple starting points. I.e. clean-slate builds as well as migrations from specific earlier versions. I think this is true whether you are using some higher level migration management tooling or just rolling your own conditional and idempotent DDL.

simonw · on April 23, 2023

I don't understand. What are you looking for here?

If you want a plain text SQL file to look at you can have that with a bit of extra automation - for example, every time you tag a release you could have a script that runs "./manage.py migrate" against that checkout and then dumps the schema out to a file somewhere (an asset attached to the release on GitHub as one example).

OJFord · on April 23, 2023

That's pretty much what I was describing in my initial reply to your top-level comment that would be an improvement, just not built in.

Ideally though I'd like a tool with first class declarative schema as the source of truth - migrations that Django can correctly autogenerate don't need to concern me at all; if I need to vary them or resolve some ambiguity in a possible path (renamed a column vs. dropped and added one, for a simple example) then there can be some way of specifying that migration. Essentially as Django targets its models as the intended result of migrations (and complains of a missing migration if they don't get there, etc.) I'd prefer instead to target SQL schema. Still check ORM consistency - but against the schema, not the migrations of it.

Starting from scratch it seems obvious that you'd have the schema, and the mapping into python models. Django does away with the former by inference, but in so doing limits its scope to the subset of possibly desired SQL that it understands.

If you have SQL and Python, then you can have DSL for the understood subset without losing migrations, tracking in a single place, etc. of the rest. You could also give Django ownership of the entire database/schema, so that any manual experiments/adjustments would be blown away on migration, they'd have to be documented in code to be persisted.

heipei · on April 23, 2023

It’s completely different as git does not store deltas but snapshots of whole files and file trees.