I'm pretty sure this means Kubernetes Ingress on GCP now support WebSockets as well! People have been asking for this feature for a long time, glad it's available now.
SNI support would also be really awesome. For certain kinds of traffic, the risks associated with SSL proxying means the HTTP(s) Load Balancer is off limits. Which is a shame, because it's an incredibly power feature compared to AWS offering.
Yes, +1 on SNI support. I know there is a feature request on it, and I've already +1 there.
We currently use GCP. We have custom domains. As it is, we're looking to schedule time to roll our own multi-az load balancers and using the TCP load balancer because the HTTPS load balancer doesn't support SNI. If SNI were available for HTTPS LB, we'd jump on that in a heartbeat.
Load Balancing software is a thing. It exists, waiting for you to install and configure it.
For a site called hacker news, its fucking depressing how many people here jump at every possible chance to give up control of the most basic things in their tech stack, for no other reason than "X already runs something similar as SaaS".
I don't think I would be so quick to dismiss this sentiment. I run a profitable SaaS business, and every opportunity to give up responsibility results in highly profitable refocusing of effort on the things we do that actually matter. I'd much rather have a team of Google engineers running our LBs than myself.
This entire thread is about the hoops people are having to jump through to get pretty basic functionality from a load balancer, specifically because they took your approach.
I didn't say it can't work for anyone, I said its depressing how it's the default/only approach for so many people who call themselves "hackers".
I think it's tempting to assume that sentiment expressed here is reflective of the majority - everyone whose use case is served well is happy and likely doesn't comment.
As for the "hackers" comment - HN might have "Hacker" in the name, but it's about a lot of things, and a big focus is on making money. Farming out boring stuff to "the cloud" has been fantastic for making money for pretty much everyone.
Where are you reading that? All of the GCP load balancers are the best available. There are no hoops to jump through. SNI based TLS is an additional feature that some would like, which is available now by just adding a CDN in front and should solve that need for most.
I'm sure anyone here can spin up a haproxy instance, that's not a big deal nor impressive in any way and there are better things to spend time on for most people and companies.
Do you write web apps in assembly code or something because you're a "hacker"?
You're just being flippant at this point but here goes:
You will never achieve the same reliability, scalability, performance, and security, at the same cost, by doing it yourself. And it's fine to layer 2 services together to get all the functionality needed. That's completely normal - like how you can use multiple APIs to build an app - unless you have a specific scenario, like security regulations, to avoid a managed solution.
Maybe building all that means you're a "hacker" but most users are acting on behalf of businesses and there are far better things to spend time on than creating a worse solution to a solved problem.
This is not basic - load balancing needs to be more reliable than the service behind it, which is hard to do. You can run your own instances with nginx or haproxy but you'll never get to the same reliability, scale and performance without a significant amount of work.
Google provides an incredible global load balancer tied into their networking fabric which you can't compete with. Why not use it? Businesses have limited resources and spending them on creating the same functionality at less scale and reliability just to have "control" (whatever that means here) is not worth it.
> This is not basic - load balancing needs to be more reliable than the service behind it, which is hard to do. You can run your own instances with nginx or haproxy but you'll never get to the same reliability, scale and performance without a significant amount of work.
It really depends on the reliability of your underlying servers; it's pretty easy to get nginx or haproxy to be reliable: get your config set and don't touch it. Then you're just down to hardware issues, which are generally fairly rare. If you need to mitigate hardware issues, you'll need to do something fancy with IP takeover/ECMP etc, which I agree is a significant amount of work.
The one thing that people really need to think about though is when you let other people do TLS termination on your behalf, you're putting the private information of your customers into the hands of someone else -- they could disclose it due to bugs in their system, rogue employees, system intruders, etc. Layer 7 load balancing features can be nice, but they come at a price. This is something to consider even if your load balancer is provided by the same people providing your servers.
Decent VPS/Rented Dedi providers, and surely all Colo providers can provide IP failover.
Edit: my point is, if a "cloud" load balancer can't provide the features you want (e.g. TLS SNI) and the "cloud" VM's can't provide the features you need to run your own LB (e.g. IP failover) maybe this is an opportunity to learn that the "cloud" isn't some golden goose shitting out golden eggs.
Most servers don't need to handle anywhere close to Google traffic and some downtime is acceptable -- the network is more unreliable. Sacrificing 0.001% uptime and "scalability" (whatever that means) is worth it.
Worth it for what exactly? I'm not sure what tradeoff you're talking about. You would rather spend the time, effort, and money to run your own worse load balancing solution that affects your business instead of clicking a few buttons in the console?
Scalability means that you don't have to worry whether it's 10 or 1M reqs/sec. And downtime is not acceptable for us, which is why using a more reliable service makes a difference.
If it does things Google's Load Balancer doesn't do (e.g. TLS SNI) and still does balancing/failover between your backend servers, by definition it isn't worse.
> And downtime is not acceptable for us
You do know the Google SLA allows load-balancer downtime in excess of 20 minutes a month, right?
Functional parity isn't all that matters. Reliability, security, performance, ops overhead and cost make a big difference. It's a tradeoff which makes sense for probably 99% of the users.
And yes, Google's SLA is fine with us, considering they will ultimately be far more reliable than we could be with our own team.
> Worth it for what exactly? I'm not sure what tradeoff you're talking about.
Now it's:
> It's a tradeoff which makes sense for probably 99% of the users
Anyway. So lets look at your other points.
> Reliability, security, performance, ops overhead and cost make a big difference.
Right, so your suggested solution is to put a HTTPS terminating CDN in front of Google Load Balancers (which presumably means you then need a different LB config+Cert for each domain you're handling in the SNI CDN?)..
> Reliability
Adding a layer to your stack like this doesn't improve your reliability - it adds a moving part, and moving parts can break.
Right, because a 'all things to all men, using shared resources' is always going to perform better than something you control, and thus configure to your specific needs.
> ops overhead
Yes, you need people to work on infrastructure you control.
Guess what. You still need people to work on infrastructure you don't fucking control. You find a "cloud" provider who will claim that Ops staff are not required, and I'll show you a fucking liar.
> cost
Right, because who cares if it doesn't do what you actually need and isn't really secure at what it does do, it's cheap.
That other comment was talking about sacrificing uptime for something unknown, while I was talking about total cost/effort in building vs buying.
You can put Cloudfront with SNI in front of GCP and all have it point to the same LB.
That's not how reliability works. Every non-trivial app has hundreds of "moving parts". So what if cloudflare had an issue? There are thousands of security issues every day, it's a fact of life. Better to have a well-funded and talented team who takes care of it.
And no, you don't need ops people to manage a cloud load balancer. What is there to do? And yes, cost is (one of many factors) that matter to business and it's still cheaper and better to put both services together (for the few who need SNI). Why do you keep saying it doesn't work?
It seems you have some irrational hatred for cloud and managed services but I'd much rather invest in and trust them than have you run my infrastructure and waste time building some fragile load balancing system instead.
> Functional parity isn't all that matters. Reliability, security, performance, ops overhead and cost make a big difference.
You're arguing that a "cloud" solution must be more secure, and that this is very important, but when I point out that the exact multi-tenant zero-customer-control setup you are championing, led to a massive leak of data that people everywhere assumed was private, your response is:
> So what if cloudflare had an issue? There are thousands of security issues every day, it's a fact of life.
> It seems you have some irrational hatred for cloud and managed services
No. I have a strong dislike of people cargo-culting the fucking shit out of whatever cool kid buzzword they last heard, without a clue what the alternatives are, if they even need it, or what the consequences of their choices are.
> I'd much rather invest in and trust them than have you run my infrastructure and waste time building some fragile load balancing system
I won't lose any sleep over not getting your business. As you have all the answers already, what you're looking for, is called a "yes-man".
That's not how security works either. A single event does not mean something is insecure. As said before, mistakes and security holes happen every day. The totality of the situation is that Cloudflare is still more secure than you because they are better resourced with more people, time, money, connections, hardware and processes. Same with load balancing, which is not a buzzword, and something Google does better than you ever will.
I look for people who understand business trade-offs, how to effectively spend time and money, have a proper understanding of risk, and know when a specialized vendor is a better for non-mainline business functions.
What exactly are you going to do when you find a hole in the load balancing software you use? Or do you write that software yourself too? What about the OS? What about the hardware? What about the datacenter? What about the transit fiber? Do you just build your own internet then? No matter what you do, you're trusting other vendors at some point.
The only possibility is to look at the entire situation and assess the risks and costs - which from this conversation seems to be your biggest problem.
You too will realize that when your load balancer will go down in the evening or in the week-end while you are away. Your site will be 100% dead for a very long amount of time.
Google gives you away the solution to your problem for free. Take it.
At what point did I or anyone else suggest "Hey you don't need to use Google's LB, because you can install your own and never monitor it, and never engage anybody to take emergency action when required".
> Google gives you away the solution to your problem for free. Take it
I'm pretty suspicious of anything any company "gives away for free", particularly an overblown advertising company.
A CDN might be fine for public, static content, or certain kinds of APIs, but a CDN does nothing for most "transactional" APIs.
If you are trying to host a multi-region deployed API, and your customers are evaluating your infrastructure against NIST and other security guidelines and recommendations, a TLS proxy is considered a security problem (go back a few months to CloudFlare's massive data exposure). You want your TLS terminated directly at your own service hosts.
In that situation you want an SNI aware LB that simply does host-based routing. You cannot achieve this today with GCPs HTTP(s) LB. Which is a pain, because it's otherwise awesome sauce.
> Just use Cloudfront or another CDN in front. This cant be worth the engineering time to build something yourself.
It is not worth the time to build something ourselves, but we can't wait much longer. I don't see any SNI support for Google CDN yet. We'd like to keep the data inside the Google network if we can. We don't really need edge caching so much of what a CDN provides other than availability is overkill.
That's fair. Yes there are bandwidth costs and 1 extra hop, although minimal performance impact - but it is an option available and might be a good stopgap solution until the feature is released.
Since when AWS is offering SNI on their load balancers?
I work at a SaaS company and we were facing the same issue in the cloud and have to implement our own TCP LB's to support SNI.
I really hope we
will see it in the near future but as i see it, Will I allow my clients to push bytecode into my production machines on demand with the lowest sandboxing possible to provide low execution times? I guess not.
I should have been more clear: GCP's Anycast, no-warming-need LB is way cooler than AWS traditional, single-region with totally unreliable DNS "routing" on top (because you always want to rely on client DNS TTLs to handle a service cutover...).
The problem is, if you can't use a TLS proxy for security reasons, then you can't use the HTTP(s) LB, which will make you less happy than if you could use it.
When, up until recently, not even Google was willing to cover their own load balancer under their BAA.
The load balancer is a shared resource. Your data is being decrypted and re-encrypted in a shared memory space with all other customer's data. Consider the issue CloudFlare customers faced recently [1].
I am not in compliance, but absence of evidence doesn't imply evidence of absence.
Google Cloud has probably industry's highest security standards in place [0] . Getting governing bodies to sign off on their accreditations is a very separate issue.
They don't have SNI, but the newer ALB does have path based routing. So you can use a wild card or multidomain cert, if you can segment routes to endpoints by uri path.
Do you mean wildcard SSL certificates or SNI? You can use wildcard SSL certificates with the GCE load balancer, but it does not have native SNI support. You'd have to roll your own there.
This has been a long time coming. Good to see it landing. There must be some technical challenges with doing a load balancer that is optimized for both HTTP round trips, AND, HTTP upgrades to long lived TCP connections.
One issue is that, as stated in the Troubleshooting section of the linked docs, source IP is not forwarded to application. Maybe in future the X-Forwarded-For header, or some other source identification, will be provided, as well.
If that's the case...how to explain that the docs say source IP is not provided?
Troubleshooting
Load balanced traffic does not have a source address of the original client
Traffic from the load balancer to your instances has an IP address in the ranges of 130.211.0.0/22 and 35.191.0.0/16. When viewing logs on your load balanced instances, you will not see the source address of the original client. Instead, you will see source addresses from this range.
Is it the case that, if the X-Forwarded-For header is present, then it is not providing a source IP, or it is providing a source IP, but that this IP is not listed in the traffic logs?
The source IP of the HTTP request is the IP of the system that makes the TCP connection, so when you use a load balancer it will always be coming from a single IP. This is what webservers and other software use when logging which is why the source IP does is not accurate in regular logging. That's what the documentation is saying - it's in the troubleshooting section because if you're only seeing a single IP then it's because you're looking at logs that won't have the right information.
Instead most proxies and load balancers add the X-Forwarded-For header which appends all IPs that are involved in the request. If you read the entire documentation page, the Fundamentals section shows what headers are added:
Just wanted to thank you for that information. And also give you some feedback on commenting. The HN commennt guidelines mention to not insinuate someone hasn't read the article. What you might not know is that I did read the section you quote, but I did not know enough to resolve it myself without asking a question. When I read that you didn't think I had, I felt you were saying, the answer would be obvious if you had read it. That hurt because it was like saying my question was stupid, which I don't like and I don't want to feel discouraged from asking questions in future. Anyway, thanks for your information, and I hope this feedback is useful.
To be slightly pedantic, websockets specifically don't have HTTP headers like XFF, which is IMO part of the problem with Websockets -- you end up re-inventing basic functionality. IMO (and combined with other factors) this is a great reason to choose http/2 over websockets.
EDIT: ALB also supports http/2, but they can also do websockets as well as WSS (where you'd term SSL on the ALB for websockets).
Http/2 and websockets are not interchangeable, you cant choose to use http/2 if you want a bidirectional lightweight communications protocol to the client.
The initial upgrade http request from the client should have all the headers you need.
What applications can use a load balancer with WebSockets? All the stuff I've built has required the WS connection to point to the same server anyway (multiplayer etc) and I feel that's the most common case but I might be completely wrong.
Not necessarily. The servers can communicate between themselves so it doesn't matter which server the clients connects to. On GCP there's PubSub [1]. Of course this is much slower than shuffling bytes in memory in the same process. It also moves some complexity from service-discovery and into the backed architecture.
If your backends aren't stateless, then yeah, you'll have to make sure connections hit the same backend (with session affinity via client IP, cookies, etc).
But session affinity is a scalability anti-pattern, which should generally only be used to support legacy applications. Ideally, you'd design your application to use stateless backends, which tend to share state at another layer (the database, memcache, Redis, etc).
I disagree that session affinity, or any affinity for that matter is a scalability anti-pattern. Why add more load on Redis or the Database if you can do it in the app server? Your just adding latency and increasing database load. Unused memory in the app server is free candy.
There are plenty of good reasons to avoid app server state (and thus session affinity), among them: Inefficiently imbalanced load balancing can affect performance and reliability, app server changes (service restart, reboot, scale down) will drop state causing interruptions (often user visible), etc.
If you don't want unused memory on your app servers, don't provision it. :-)
There's pros and cons to each method, as we've both pointed out. My main point is that it's not an anti-pattern, it's an architecture decision, and the pros and cons of each should be weighed when designing an app.
I just finished working on a real-time collaborative document editing service and chose to use document affinity so that the app server could handle collaboration directly. It is extremely performant and I'd choose this approach again any day versus farming out to Redis/RethinkDB/etc. It persists to a database as writes come in, and performs initial load from the database. Local memory accesses are orders of magnitude faster than going over the network, and it reaps the benefits.
Quite a few frameworks use redis et. al to manage websocket sessions. I can recommend Django Channels [1] with redis for chat and real time object binding.
What's the benefit here instead of using the tcp load balancer? I'm guessing it's primarily URL mapping.
Does it support websocket compression (meaning the load balanced servers don't need to do compression - as it'd be handled by the proxy) We would probably switch in a heartbeat if it did
Can you elaborate more from documentation that this new features can filter out DDOS attacks or handle rate limiting? Because i can't find out that statement
Google's load balancers are global - as in they will route requests to the nearest region to the user without you having to run separate load balancers in each with dns routing.
The backend services hooked up to the load balancer can have healthchecks and capacity limits based on req/sec or CPU usage, so a region that's close by but overloaded will be skipped for a further region that has available capacity.
http(s) load balancer only route http(s) requests to your compute instances, and hide the instance IPs. So at least Layer 3 DDOS can not touch your instances.
But, when talking Kubernetes. it is currently no support for internal load balancer, right? so it's still have vulnerability to get exposed to the world. Any comment about this?
You can do URL routing to different backed services like you mentioned, but you can also have a single LB route traffic to the closet datacenter running your service that has capacity. This means you don't need a DNS based system with multiple LBs for a global or HA system.
I find it quite ironic that Google - who pushed so hard to make http/2 happen, and is pushing so hard to make https the defacto default for the web - don't fully support either in their own load balancing solution.
Heroku has supported websockets through its routing infrastructure for ages, along with things like SNI (which has been mentioned elsewhere here as things people wish were available in GCP). Not a ding on GCP necessarily, just notable how featureful mature cloud platforms are, such that "even Google" is playing catch-up.
I wouldn't say playing catch up so much as I would say they're focused on features that provide the greatest amount of benefit to the greatest number of users. SNI is easily solved by setting up a Nginx proxy behind their HTTPS load balancer. Sure, I would really like it if SNI were baked in, but where they may seem like they fall short on small things, like this they are eons ahead of other providers with products, like BigQuery. BigQuery has made all the difference in the world for our engineering department. We have a vast, robust data warehouse, it's easy to read/write, easy to manage, and best of all we do not have the brunt maintenance around supports its own infrastructure.
It doesn't say how this changes the load distribution algorithm. Since websocket connections are usually long-running it wouldn't be as useful to use the default RPS (requests per second) algorithm. Ideally, it would be something that takes into account the number of connection each backend instance has.
The distribution algorithm options are either a) rate or b) utilization. Re rate, you can configure the Requests Per Second but this doesn't mean much in a long-running connection context because some connections can be drastically longer (hours, days?). This may lead to starvation or overloading an instance.
(I work on GCP)