Maybe there are some other things you can say for certain. But as to some of your points:
> Watch out for positive-only feedback loops, you absolutely need negative feedback as well - or only. Eg. exponential back-off.
Agreed it may need negative feedback, but I'm not sure about always.
If your service has a latency SLA, exponential back-off might kill your SLA (depending on wordage and where the back-off is). The fix is to soft reject requests (RST rather than dropping packets) when you can't meet the demand. This change may allow you to meet your SLA if it's written to prioritize low latency over service unavailability.
This is it's own negative feedback loop, but change from sending RSTs to silently dropping and you no longer have the feedback.
> Sometimes, you just need a decentralized solution, rather than a distributed one
Agreed
> Loose coupling is your friend.
Until it isn't? :)
> add an extra layer of indirection, but you probably need to pay more attention to cache invalidation
Fixes for additional layers tend to increase system complexity compared to fixes for fewer layers.
> Throughput probably matters more than latency.
Until it doesn't :)
> Reducing the size/number of writes will probably help more than trying to speed them up.
Depending on 20 different things... You really have to account for all the system's limits (and business use cases) and find the solution that matches the implementation needs.
> there is probably a huge business for multi-tenancy-as-a-service
Sure, it's called EKS :-) Just build more clusters... Don't worry, we'll bill you...
> Don't overthink it
Yes and no; Yes, in that there will always be unknowns. But no, in that often improvements in communication will provide better solutions without extra work. Think smarter, not harder!
> Watch out for positive-only feedback loops, you absolutely need negative feedback as well - or only. Eg. exponential back-off.
Agreed it may need negative feedback, but I'm not sure about always.
If your service has a latency SLA, exponential back-off might kill your SLA (depending on wordage and where the back-off is). The fix is to soft reject requests (RST rather than dropping packets) when you can't meet the demand. This change may allow you to meet your SLA if it's written to prioritize low latency over service unavailability.
This is it's own negative feedback loop, but change from sending RSTs to silently dropping and you no longer have the feedback.
> Sometimes, you just need a decentralized solution, rather than a distributed one
Agreed
> Loose coupling is your friend.
Until it isn't? :)
> add an extra layer of indirection, but you probably need to pay more attention to cache invalidation
Fixes for additional layers tend to increase system complexity compared to fixes for fewer layers.
> Throughput probably matters more than latency.
Until it doesn't :)
> Reducing the size/number of writes will probably help more than trying to speed them up.
Depending on 20 different things... You really have to account for all the system's limits (and business use cases) and find the solution that matches the implementation needs.
> there is probably a huge business for multi-tenancy-as-a-service
Sure, it's called EKS :-) Just build more clusters... Don't worry, we'll bill you...
> Don't overthink it
Yes and no; Yes, in that there will always be unknowns. But no, in that often improvements in communication will provide better solutions without extra work. Think smarter, not harder!