GPU observability is broken, so we built Neurox. When I co-founded Mezmo (a Seri...

nickysielicki · 2025-04-30T01:23:12 1745976192

> Everyone we talked to runs their compute in multi-cloud and uses Kubes as the unifier across all environments.

I categorically support any company willing to take a strong stance on the total irrelevance of slurm.

dharmab · 2025-04-30T02:25:56 1745979956

Is your comment pro-SLURM or anti-SLURM?

I took a serious look at SLURM for my problem space and among my conclusions were:

- Hiring people who know Kubernetes is going to be far cheaper

- Kubernetes is gonna be way more compatible with popular o11y tooling

- SLURM's accounting is great if your billing model includes multiple government departments and universities each with their own grants and strict budgets, but is far more complex than needed by the typical tech company

- Writing a custom scheduler that outperforms kube-scheduler is far easier than dealing with SLURM in general

leeab · 2025-04-30T06:22:06 1745994126

We're not for nor against Slurm. I do believe it has use cases in HPC, scientific and academic settings. We think our web UI is a bit easier to use and we do offer a competing scheduler.

Our focus is definitely more on container-first, cloud-native Kubernetes environments like EKS, GKE, AKS. Also we're way more health monitoring of the actual GPU hardware rather than just scheduling jobs.