At PolyScale [1] we tackle many of the same challenges. Some of this article fee...

avinassh · on Aug 24, 2023

This is interesting! How does polyscale works, especially this part:

> PolyScale automatically and intelligently caches or invalidates data close to where it is being requested.

skybrian · on Aug 24, 2023

This pitch is rather opaque to me. How does cache invalidation actually work?

I don't see how cache invalidation happens at all unless all changes go through PolyScale. What about making a change to the database directly?

_ben_ · on Aug 24, 2023

Thanks for the questions. At a very high level, the AI uses statistical models that learn in real-time and estimate how frequently the data on the database is changing. The TTL's get set accordingly and are set per SQL query. The model looks at many inputs such as the payload sizes being returned from the database as well as arrival rates.

If PolyScale can see mutation queries (inserts, updates, deletes) it will automatically invalidate, just the effected data from the cache, globally.

If you make changes directly to the database out of band to PolyScale, you have a few options depending on the use case. Firstly, the AI, statistical based models will invalidate. Secondly, you can purge - for example after a scheduled import etc. Thirdly, you can plug in CDC streams to power the invalidations.

Feel free to ping me if you would like to dig in deeper (ben at) and this document provides more detail on the caching protocol: https://docs.polyscale.ai/how-does-it-work#caching-protocol

This blog also goes in to detail on how invalidation works: https://www.polyscale.ai/blog/approaching-cache-invalidation...