Ask HN: Infra people in small companies, what does your infra look like?

colomofo · on Nov 4, 2023

I've built infra for a lot of different orgs over a long period of time. My recommendation is that unless (and until) you're building for scale, you Keep It Simple, Stupid (KISS).

Use an IaC system (Terraform, Pulumi, etc) to manage everything from Day 1:

1. Use a major cloud provider (AWS or GCP, probably)

2. Get a managed HTTP load balancer (ELB, ALB, whatever)

3. Package your app in a container image, and run your app on 3+ containers behind the load balancer (using on bare VMs, K8S, whatever you prefer), ideally at least 2x containers in 2x AZs.

4. Set up a managed database cluster with Postgres or MySQL and run it with multi-AZ and failover

5. Run 2x VM instances (for redundancy) for asynchronous jobs (using a message bus service or using your database as a work queue), ideally 1x in each of the AZs your database is in

6. Store any large files in cloud storage and put them behind a CDN

That's all 99% of companies will ever need to do. These are all old technologies that Just Work.

atmosx · on Nov 4, 2023

I've worked with quite a few small teams in the past, part time. Never had a fleet as small as 2 VMs, more like 20-120. That's a good summary. Only thing missing that comes up always is the CI/CD workflow. I've found bitbucket's CD surprisingly _good_ once you follow the enforced pattern, especially for small teams who don't want to spent too much on release technicalities.

Then there's more nuanced things that most teams will miss early on without someone pointing the problem and the solution e.g. decouple configuration from the app, design stateless apps (e.g. 12 factor app), use secrets management easily (e.g. dynamoDB based solutions like credstash are dirt cheap, AWS secrets is okay-ish), used managed DBs (RDS is the most common choice) and more.

aprdm · on Nov 5, 2023

really ? Wow.. I've never had 120 VMs to manage in any org I was part of, and have worked on fortune 500 companies !

I have provided infrastructure-as-a-service for many many baremetal/VMS/containers, but then I wasn't managing them (e.g: it was many small team's infra, not a small team of 10 people using 20-120 VMs as per OP)

atmosx · on Nov 5, 2023

That didn’t come out well - by VMs I mean EKS or Swarm nodes not VMs in term of “pet” not “cattle” :-)

The number of nodes was used to run a monolith. Same deployments and all, wasn’t as complicated as it might sound (terraform to manage infra, few custom apps to talk to external APIs, etc)

These were definitely not 500 fortune companies but AFAIK all of them had some cash flow or were acquired by behemoths.

tkiolp4 · on Nov 4, 2023

How does a deployment look like in that setup?

colomofo · on Nov 4, 2023

It depends how you deploy the app containers.

If you use a container orchestration service (ECS, GKE) then you use that system to upgrade container image with new app versions.

If you use autoscaling groups / managed instance groups, then you use that to replace VMs with new ones running containers with the new app image.

Using rolling updates, the load balancer drains connections to a container and then it is replaced.

Ideally this is done using your IaC system. So a deployment just involves changing the container image (app version) and applying the update.

mrngilles · on Nov 4, 2023

I would like to have some insights about this as well

nip · on Nov 5, 2023

Excellent comment and suggestion!

Very refreshing to read in this day and age where it’s not uncommon to have more micro services than paying customers.

skor · on Nov 4, 2023

any reliable managed database cluster suggestions? I’m looking for mariadb or mysql, former preferred thanks

colomofo · on Nov 4, 2023

Cloud SQL from GCP or RDS from AWS should both work just fine. They're just about the only companies I trust managed databases from.

skor · on Nov 4, 2023

thanks

alxmng · on Nov 4, 2023

A $40/mo VPS. Caddy web server. Supervisor to restart automatically. Git to deploy. CI using GitHub actions which build, run tests, and push to server on success. Automatic VPS snapshots and a cronjob to backup the DB to cloud storage. DB runs on the same box.

This has worked for years with practically zero maintenance.

mrngilles · on Nov 4, 2023

What's the size of the team working on this?

I'm guessing you're OK with the tradeoff of your machine going down sometimes?

alxmng · on Nov 4, 2023

4 people but sometimes 10 with contractors.

Sure, there's downtime when the VPS provider is down. That's no different than "managed" services.

mrngilles · on Nov 4, 2023

So you just have the dns record pointing to your machine's IP?

Do you have some kind of monitoring? What about logging?

alxmng · on Nov 4, 2023

That's correct. We used to use CloudFlare, but it added too much time to each request (speed is important to us). So now it's just pointed directly at the VPS.

Crashes are logged but I don't think anybody has ever looked at the logs. I'm not sure a crash has ever happened (the app is written in Haskell). Unexpected errors are saved to the database and can be viewed using the web. If we needed logs people could tail them from the VPS (ssh -t $WEB_HOST 'tail -f webserver.error.log'). I know, I know, that's crazy. I'm told real software companies are supposed to ship them to someone else, and then pay them to view the logs using a crappy web interface.

No monitoring. If the site is down customers will let us know, which has happened a few times. Monitoring wouldn't help much with shipping bugs anyway.

aprdm · on Nov 5, 2023

One can easily set up prometheus/grafana or ELK or sentry when you're that small on another VM.

oops · on Nov 4, 2023

> A $40/mo VPS.

> This has worked for years with zero maintenance.

Do you apply security updates, e.g. for Caddy, the OS, db, etc.?

alxmng · on Nov 4, 2023

Yeah, every so often I spend 5 minutes and run apt upgrade. I don't think the DB has ever been upgraded because there's no reason to. It's not exposed to the Internet anyway.

chatmasta · on Nov 4, 2023

Your DB doesn't need to be connected to the internet for untrusted users to have a path for interacting with it. I assume your application sends queries to it, for example. And users give your application the parameters for those queries.

And that's when things are working as expected. All it takes is for one of your non-DB services to be compromised, and an attacker can now connect to the DB on localhost. That's why it's best practice to put a secure password on your DB even if you only expect connections from local services. And yes, you should upgrade it too... or at least apply security patches.

aprdm · on Nov 5, 2023

Lol, the amount of databases running real production heavy workloads for the biggest companies in the world that haven't been updated in 10y is probably very high :) at least the ones I've seen ..

alxmng · on Nov 5, 2023

Yes, database connections are password protected.

aprdm · on Nov 5, 2023

yes this but with NGINX/Systemd! Haven't used caddy - will check

ttymck · on Nov 4, 2023

Are you hiring :D

iamflimflam1 · on Nov 4, 2023

Often what you end up will have been driven by the founder CTO or first technical hire. And it will be whatever they had most familiarity/experience with.

For a monolith and size you are talking about a sensible approach would be pretty minimal infra. The app would be hosted on a managed service (e.g cloud run, elastic beanstalk, app engine etc…) along with some kind of managed database service.

You might also see a kubernetes cluster being used - generally happens if that’s what the original devs knew.

Adventurous places may have the monolith deployed on serverless which can be pretty cost effective.

Heroku like services are also still popular.

In my previous place we had the monolith on cloud run.

Anything more complex than the above would to me be a bad smell - unless there were some very good reasons.

mrngilles · on Nov 4, 2023

> In my previous place we had the monolith on cloud run.

How was managed a new deployment ? Could anyone push a new version ? Did it go through CI/CD ?

And about CI/CD, did you have something vaguely resembling it ? Was it all hosted ?

> Anything more complex than the above would to me be a bad smell - unless there were some very good reasons.

Yes, you would probably not need something more complex in that case, but then again, some companies with 5 devs start with microservices

iamflimflam1 · on Nov 4, 2023

All through CI/CD running on GitHub actions. There's really no excuse nowadays for not having good CI/CD processes.

We could have used googles cloud build, but the team knew GitHub actions well.

aprdm · on Nov 5, 2023

Digital ocean with NGINX in a VM, web app in another and Postgres in another... will take you very far, just take backups and deploy all with ansible

dyeje · on Nov 5, 2023

With 10 devs on a monolith, I probably wouldn’t dedicate anyone to infra. Use a PaaS like Render or Aptible, focus on shipping product.

jamarty2 · on Nov 5, 2023

The cheapest.

In our case that’s a rack in a datacenter, a SAN, MAAS, VMs, terraform, and chef.

comprev · on Nov 7, 2023

As simple as possible, provisioned via IaC, and backed up with clear documentation.

Your replacement will thank you, and so will management for a smooth transition.