At PolyScale [1] we tackle many of the same challenges. Some of this article feels a little dated to me but the data distribution, connectivity and scaling challenges are valid.
We use caching to store data and run SQL compute at the edge. It is wire protocol compatible with various databases (Postgres, MySQL, MS SQL, MariaDB) and it dramatically reduces query execution times and lower latency. It also has a JS driver for SQL over HTTP, as well as connection pooling for both TCP and HTTP.
Thanks for the questions. At a very high level, the AI uses statistical models that learn in real-time and estimate how frequently the data on the database is changing. The TTL's get set accordingly and are set per SQL query. The model looks at many inputs such as the payload sizes being returned from the database as well as arrival rates.
If PolyScale can see mutation queries (inserts, updates, deletes) it will automatically invalidate, just the effected data from the cache, globally.
If you make changes directly to the database out of band to PolyScale, you have a few options depending on the use case. Firstly, the AI, statistical based models will invalidate. Secondly, you can purge - for example after a scheduled import etc. Thirdly, you can plug in CDC streams to power the invalidations.
We use caching to store data and run SQL compute at the edge. It is wire protocol compatible with various databases (Postgres, MySQL, MS SQL, MariaDB) and it dramatically reduces query execution times and lower latency. It also has a JS driver for SQL over HTTP, as well as connection pooling for both TCP and HTTP.
https://www.polyscale.ai/