Not a great fit. Something like Ampere altra is better as it gives you 80 cores and much more memory which better fits a server. A server benefits more from lots of weaker cores than a few strong cores. The M1 is an awesome desktop/laptop chip and possibly great for HPC, but not for servers.
What might be more interesting is to see powerful gaming rigs built around the these chips. They could have build a kickass game console with these chips.
Why they didn't lean into that aspect of the Apple TV still mystifies me. A Wii-mote style pointing device seems such a natural fit for it, and has proven gaming utility. Maybe patents were a problem?
Why? There are plenty of server oriented ARM platforms available for use (See AWS Graviton). What benefit do you feel Apple’s platform gives over existing ones?
The Apple cores are full custom, Apple-only designs.
The AWS Graviton are Neoverse cores, which are pretty good, but clearly these Apple-only M1 cores are above-and-beyond.
---------
That being said: these M1 cores (and Neoverse cores) are missing SMT / Hyperthreading, and a few other features I'd expect in a server product. Servers are fine with the bandwidth/latency tradeoff: more (better) bandwidth but at worse (highter) latencies.
My understanding is that you don't really need hyperthreading on a RISC CPU because decoding instructions is easier and doesn't have to be parallelised as with hyperthreading.
The DEC Alpha had SMT on their processor roadmap, but it was never implemented as their own engineers told the Compaq overlords that they could never compete with Intel.
"The 21464's origins began in the mid-1990s when computer scientist Joel Emer was inspired by Dean Tullsen's research into simultaneous multithreading (SMT) at the University of Washington."
Okay, the whole RISC thing is stupid. But ignoring that aspect of the discussion... POWER9, one of those RISC CPUs, has 8-way SMT. Neoverse E1 also has SMT-2 (aka: 2-way hyperthreading).
SMT / Hyperthreading has nothing to do with RISC / CISC or whatever. Its just a feature some people like or don't like.
RISC CPUs (Neoverse E1 / POWER9) can perfectly do SMT if the designers wanted.
Don’t think that is entirely true. Lots of features which exist on both RISC and CISC CPUs have different natural fit. Using micro-ops e.g. on a CISC is more important than in RISC CPU even if both benefit. Likewise pipelining has a more natural fit on RISC than CISC, while micro-op cache is more important on CISC than RISC.
I don't even know what RISC or CISC means anymore. They're bad, non-descriptive terms. 30 years ago, RISC or CISC meant something, but not anymore.
Today's CPUs are pipelined, out-of-order, speculative, superscalar, (sometimes) SMT, SIMD, multi-core with MESI-based snooping for cohesive caches. These words actually have meaning (and in particular, describe a particular attribute of performance for modern cores).
RISC or CISC? useful for internet flamewars I guess but I've literally never been able to use either term in a technical discussion.
-------
I said what I said earlier: this M1 Pro / M1 Max, and the ARM Neoverse cores, are missing SMT, which seems to come standard on every other server-class CPU (POWER9, Intel Skylake-X, AMD EPYC).
Neoverse N1 makes up for it with absurdly high core counts, so maybe its not a big deal. Apple M1 however has very small core counts, I doubt that Apple M1 would be good in a server setting... at least not with this configuration. They'd have to change things dramatically to compete at the higher end.
POWER9, RISC-V, and ARM all have microcoded instructions. In particular, division, which is very complicated.
As all CPUs have decided that hardware-accelerated division is a good idea (and in particular: microcoded, single-instruction division makes more sense than spending a bunch of L1 cache on a series of instructions that everyone knows is "just division" and/or "modulo"), microcode just makes sense.
The "/" and "%" operators are just expected on any general purpose CPU these days.
30 years ago, RISC processors didn't implement divide or modulo. Today, all processors, even the "RISC" ones, implement it.
It's slightly more general than that, hiding inefficient use of functional units. A lot of times that's totally memory latency causing the inability to keep FUs fed like you say, but i've seen other reasons, like a wide but diverse set of FUs that have trouble applying to every workload.
The classic reason quoted for SMT is to allow the functional units to be fully utilised when there is instruction-to-instruction dependencies - that is, the input of one instruction is the output from the previous instruction. Doing SMT allow you to create one large pool of functional units and share them between multiple threads, hopefully increasing the chances that they will be fully used.
Well, tons, there isn't another ARM core that can match a single M1 Firestorm, core to core. Heck, only the highest performance x86 cores can match a Firestorm core. and that's just raw performance, not even considering power efficiency. But of course, Apple's not sharing.
They were, but have stopped talking about that for years. The project is probably canceled; I've heard Jim Keller talk about how that work was happening simultaneously with Zen 1.