One thing I'd personally recommend from thinking about this last week regarding schema migrations is the following:
Unlike Postgres or RDBMS, being a NoSQL store FDB has some advantages with long running things like migrations. In particular, FDB can store things in an arbitrary manner.
I created a notion of a "preempt" for migrations. The way a preempt works is that when you define a migration, you also define a preempt which represents how the old value changes to the new value after the migration.
You'd obviously run some code to modify everything. Lot's of ways to do this, map reduce, batch job, etc. The problem is, if you happened to have 100 million of these rows, it will take you a long time to modify all of them. There are a lot of ways to solve this - locking being a popular one.
I created a notion of a preempt so you can define the change in the migration and immediately have access to the change if you access the particular record prior to the migration job getting to it.
So in the above example, you could have a migration that looks like the following:
class Migration {
@up
function migrate(oldRecord) {
oldRecord.lengthOfUsername = oldRecord.username.length;
return oldRecord
}
@down
function migrate(currentRecord) {
delete currentRecord.lengthOfUsername;
return currentRecord;
}
}
What's nice about this is that if you use "preempts", you don't have to have any conditions around the long running jobs in your application code. You can treat the long job as already being completed as soon as you run it, regardless of the amount of records. You can call it a just-in-time migration for new records, to be run as you access records. The reason I felt this to be necessary is to maintain the transactional (completed or not completed) semantics FDB gives you because it made the code easier to work with if you can assume things are done, or not. Eventual consistency is a huge pain and creates too many bugs imho. The other reason I like preempts with FDB is because it's literally something you can't do with RDBMS (you couldn't treat a column as another type until the transaction has actually completed for a alter table, for instance).
I would also not get too invested in your architecture such that you cannot change data types during schema migrations. I'd generalize it so it's always possible. Like if you have an integer field, and an index on it and you use $gte, it does what you expect. If you change it to a string, $gte still works, and uses the lexicographic ordering instead of the number ordering. You can imagine the equivalent for all of the operators.
Only caveat is that you'd either need to ensure all preempt code is idempotent since your long running job might run the code twice (once just in time, and another during the migration), or you'd need to save which records have already been processed via the just-in-time migration and skip those as necessary. This leads to issues since you would need more storage space, and then you'd have to clean it up, and if you have a full disk that leads to more issues, etc.
Avoiding rebuilding the table is definitely possible for certain schema changes like adding a new field.
But supporting data type changes without rebuilding is not ideal. It will lead to data quality issues and complexity on the application. Integer -> String example is simple. But what about String -> integer, how are the consumers of data supposed to handle the situation where the field in some records has a string value and in some has an integer value? They will have to add type checking which complicates each of these consumers.
What they are saying is that if you grab the value it will be made into String (as you expect it to be if you locked and did it all) with a JIT migration as a part of the fetch operation.
postgres has a version of this preempts idea with default values on columns. postgres will fill in the value at query time without needing to backfill the data. postgres is not a horizontally scalable database like FDB so not a direct comparison. In practice this means the migration lock is much shorter and it becomes possible to actually have a default in large tables.
Allowing default values for columns is definitely doable and we can also implement it in a similar way by filling it in during the query. But changing the type of a field to an incompatible type is tricky and needs more constraints and external machinery to fix the history.
Unlike Postgres or RDBMS, being a NoSQL store FDB has some advantages with long running things like migrations. In particular, FDB can store things in an arbitrary manner.
I created a notion of a "preempt" for migrations. The way a preempt works is that when you define a migration, you also define a preempt which represents how the old value changes to the new value after the migration.
For example, if you have:
and you want: You'd obviously run some code to modify everything. Lot's of ways to do this, map reduce, batch job, etc. The problem is, if you happened to have 100 million of these rows, it will take you a long time to modify all of them. There are a lot of ways to solve this - locking being a popular one.I created a notion of a preempt so you can define the change in the migration and immediately have access to the change if you access the particular record prior to the migration job getting to it.
So in the above example, you could have a migration that looks like the following:
What's nice about this is that if you use "preempts", you don't have to have any conditions around the long running jobs in your application code. You can treat the long job as already being completed as soon as you run it, regardless of the amount of records. You can call it a just-in-time migration for new records, to be run as you access records. The reason I felt this to be necessary is to maintain the transactional (completed or not completed) semantics FDB gives you because it made the code easier to work with if you can assume things are done, or not. Eventual consistency is a huge pain and creates too many bugs imho. The other reason I like preempts with FDB is because it's literally something you can't do with RDBMS (you couldn't treat a column as another type until the transaction has actually completed for a alter table, for instance).I would also not get too invested in your architecture such that you cannot change data types during schema migrations. I'd generalize it so it's always possible. Like if you have an integer field, and an index on it and you use $gte, it does what you expect. If you change it to a string, $gte still works, and uses the lexicographic ordering instead of the number ordering. You can imagine the equivalent for all of the operators.
Only caveat is that you'd either need to ensure all preempt code is idempotent since your long running job might run the code twice (once just in time, and another during the migration), or you'd need to save which records have already been processed via the just-in-time migration and skip those as necessary. This leads to issues since you would need more storage space, and then you'd have to clean it up, and if you have a full disk that leads to more issues, etc.