> The games industry is weird: It simultaneously lags behind the rest of the tec...

dijit · on March 2, 2023

The “years ahead” was described in the article. We had what amounts to a very well oiled kubernetes installation with mTLS, but on Windows, in C++ and 10 years ago (before Kubernetes was a thing).

Everything else you say is true, I could have chosen another solution for backups but the tradeoff between backup speed (replaying WAL can be time consuming) vs database load (full backups delay replication to replicas) was made before I knew the read/write characteristics of the datadomain. (I didn't even know it was a datadomain- nobody told me anything except giving me an NFS mount point until I started having problems, at which point I spent weeks debugging with a storage engineer from Montreal- this article is a 6-8month span of back and forth distilled).

Regardless: restore times are important, as the wise sages of our industry once said; you don't have a backup until it is tested in a restore - which is what this system was effectively doing. How would you test without reading?

tinco · on March 3, 2023

> The “years ahead” was described in the article. We had what amounts to a very well oiled kubernetes installation with mTLS, but on Windows, in C++ and 10 years ago (before Kubernetes was a thing).

The revolutionary thing about Kubernetes was not what it did, but that it was an enterprise supported open source version of what everyone was doing internally. And not tailored to a specific domain but fully generic.

Both you, I and the other person replying to your comment were running a bespoke system like this in 2014. That's just 3 people semi-randomly meeting in a thread. If you imagine how much engineering work was saved by Kubernetes since then it's mind boggling.

Of course not all of them were as fancy as you're describing yours was, mine was just a collection of bash scripts, but they solving that set of problems was definitely a thing for many projects in 2014.

ilyt · on March 3, 2023

> Both you, I and the other person replying to your comment were running a bespoke system like this in 2014. That's just 3 people semi-randomly meeting in a thread. If you imagine how much engineering work was saved by Kubernetes since then it's mind boggling.

Not if you're one running the k8s cluster. K8S benefits are wholly on dev deploying apps side, not ops.

Hell, in early versions k8s "automation" to build the cluster was so bad it made clusters that would self-destruct after a year, because there was no proper cert management built in. Even now you have huge amounts of code dedicated purely to "care and feeding" of k8s cluster. Sure, you can deploy one with one command, but if something goes wrong or something needs to be debugged you're jumping into a swamp.

We already had CM-automated CA so implementing K8S wasn't really a problem for us (just a bunch of learning) but it absolutely overcomplicates everything involved because vast majority of apps just isn't big enough to reap the benefits.

Especially when you need to herd a bunch of daemons just to have proper monitoring or logging inside it. Or how many apps now need a "sidecar" just to get some stats in it and you have 20 different apps that only job is "fetch some stats from app that doesn't support <current monitoring fad> and push it into <current monitoring fad>"

ilyt · on March 3, 2023

> The “years ahead” was described in the article. We had what amounts to a very well oiled kubernetes installation with mTLS, but on Windows, in C++ and 10 years ago (before Kubernetes was a thing).

I mean we had those things back then too, that's why I was curious. But then I guess we're a bit of minority, we have near-everything under configuration management for over a decade now.

Doing it under windows does seem like an achievement, it isn't exactly OS nice for tinkering.

> Everything else you say is true, I could have chosen another solution for backups but the tradeoff between backup speed (replaying WAL can be time consuming) vs database load (full backups delay replication to replicas) was made before I knew the read/write characteristics of the datadomain. (I didn't even know it was a datadomain- nobody told me anything except giving me an NFS mount point until I started having problems, at which point I spent weeks debugging with a storage engineer from Montreal- this article is a 6-8month span of back and forth distilled).

Seems you're not the only one:

https://www.dcac.com/syndication/if-you-thought-database-res...

https://community.spiceworks.com/topic/1868887-slow-recovery...

I honestly expected the usual "cherry picking files is slow but if you tell the backup software to do full restore it goes somewhat quickly" but it appears I misjudged how shitty the "enterprise" backup software can really be.

I wanted to say "a lot of fault lies on Ubisoft for not documenting upfront the workings and quirks of the system" but if you had similar speed of restores there is no excuse for that and no amount of warnings can fit something that bad...

There are days where I wish company I work for was 100x bigger so I could excuse going "fuck that shit, we will just make our own backup system"... we once priced migration from OSS software to Veeam and it costed more than the running cost for the servers... and few racks of new servers.

Anyway, our solution when we had that problem one time (backup too big to restore quickly except in "DC burning" cases) was 2 tiered backup, some "cheapest used box with many HDDs" acting as first tier for backup with second tier being the usual data store. Which seems like what happened in your case in the end.

And yeah, we also had cases where big fuckups caused management to finally spend the money.

munificent · on March 3, 2023

> I'd love to see examples of "ahead" coz I literally never saw it.

Game developers were deeply aware of data-oriented design and optimizing code and data structures for efficient CPU cache usage well before I saw most other areas of industry aware of it. You can find counter-examples, of course, but overall, this is an area of performance that many game developers understand in their bones, and many programmers outside of games—even ones who care a lot about performance—are oblivious too.

ilyt · on March 3, 2023

Depends on which side you look from. If you look from embedded realtime development side it might be game developers being behind and rest of the industry REALLY behind.

Game is essentially soft realtime system that needs to deliver frame every 16.6ms or gamer is unhappy. Very few pieces of software does optimization like game development does because it is just not that often needed.