I was under the impression that Kubernetes was a complicated beast not meant for...

I was under the impression that Kubernetes was a complicated beast not meant for small teams / startups. What is the value of it in this monolith environment? Is the key to using it in a startup context to use it as a basic monolith auto-scaling orchestrator but no more than that? If you or anyone else here can comment about how to use Kubernetes strategically without falling into an unnecessary over-engineering rabbit hole, I'm willing to learn from you.

Regarding the rate limiting, you're load balancing into nginx services that you've configured to limit requests. Are they synchronizing rate limiting state? I can't seem to find nginx documentation supporting this. What value is there in this style of rate limiting, considering User X can send a sequence of requests into a load balancer that routes them to nginx boxes A, B, and C? The big picture that 3 requests were processed for user X gets lost. Your endpoint-level rate limiting, however, may potentially be achieving the synchronized rates if the redis servers in a cluster are synchronizing. I guess I'm asking about the strategy of using multiple lines of rate limiting defense. Is nginx-level rate limiting primarily for denial of service?

The horizontal autoscaler should be based on throughput rather than hardware consumption, shouldn't it? If the req/sec goes below a threshold, spawn a new service. Can anyone comment?