I find the article quite informative. Yes, M2 and the other chips are completely different products with different goals. If one wants to say that something completely trumps the other, it will be wrong.
But here is what is visible:
The M2 core is probably in the same ballpark as Zen 4 core, likely a tiny bit below. That may become very tiny if Zen 4 core runs at lower frequency to equalize the power. This doesn't account for the AVX512 of Zen4.
24 M2 cores manage to beat 16 Zen 4 cores also at lower power, but these are different products. Zen 4 does scale to far more cores, 96 in an EPYC chip. AMD and Intel have far more investments in interconnects and multi-die chips to do these things.
The M2 GPU is in the same league as a 300$ mid-range nVidia card. It is not competitive at all - Apple produces the largest chip it can manufacture to go against a high margin smaller chip that nVidia orders.
Again all of this doesn't mean each product is not good on its own.
Apple's GPU performance is what makes me sceptical about their gaming related advertising. Sure, you can do 1080p gaming with the highest SKU, but you're paying through the nose if you bought an M2 to play games.
It seems strange to me for Apple to advertise something they haven't exactly mastered yet on stage.
Maybe they have some kind of optimization up their sleeves that will roll out later? I can imagine Apple coming out with their own answer to DLSS and FSX2 based on their machine learning hardware, for example. On the other hand, I would've expected them to demonstrate that in the first place when they shoed off their game port toolkit.
With crossover and Apple's latest release of gameportingtoolkit I'm able to maintain over 120FPS on ultra settings at native resolution on Diablo 4 with my M2 Max MBP. It was fair to be skeptical before that release this week, but there's plenty of evidence out there now that Apple silicon can handle gaming just fine. Other users are reporting 50-60 FPS with ultra settings on their 6k Studio displays.
I thought the whole idea of M2 was “exceptional product given the power consumption”.
I don’t mind that it has nothing to show for all the talk once you throw out the need to basically sip power (like a notebook computer).
Is this something inherent with ARM though? Why can’t there be ARM based desktop and server computers that need a kilowatt of power at peak? Like how much more performance can you get for each additional watt of power? (I don’t know. I’m genuinely asking.)
I was one told that memory and bus bandwidth often creates disparity between benchmark and application performances in ARM CPUs. That was years ago and supposedly don’t apply to custom designs like M2, but maybe both Intel and AMD are still advantageous in that region?
> I thought the whole idea of M2 was “exceptional product given the power consumption”.
When running native code.
Look at the performance of Microsoft's ARM Surface Pro when running emulated code.
> My frustration with this computer wasn’t a workload thing. It didn’t start out fast and gradually slow down as I opened more things and started more processes. It was peppered with glitches and freezes from start to finish.
I’d have only Slack open, and switching between channels would still take almost three seconds (yes, I timed it on my phone). Spotify, also with nothing in the background, would take 11 seconds to open, then be frozen for another four seconds before I could finally press play. When I typed in Chrome, I often saw significant lag, which led to all kinds of typos (because my words weren’t coming out until well after I’d written them). I’d try to watch YouTube videos, and the video would freeze while the audio continued. I’d use the Surface Pen to annotate a PDF, and my strokes would either be frustratingly late or not show up at all. I’d try to open Lightroom, and it would freeze multiple times and then crash.
It quickly became clear that I should try to stick to apps that were running natively on Arm.
Not familiar with DLSS at all, does it requires developers to do something in order to take avantage of it too? I had imagined it was automatic but then again I know nothing about it beyond the marketing pitch to consumers.
I am not knowledgeable enough to know how much work it is but I have played games that didn’t initially support it but eventually released an updated that added support.
There are also multiple “levels” for DLSS in games that support it, eg. Quality, performance, etc
> Apple's GPU performance is what makes me sceptical about their gaming related advertising.
The issue is that people compare games running under emulated x86 and emulated graphics APIs, when making claims about what the SOC is capable of.
There's nothing wrong with knowing how well the SOC performs when emulating games, but if you claim to be talking about what the SOC can do, then include the performance of native games as well.
Apple's x86 emulation is otherwise very impressive, and not many games are bottlenecked on the CPU, especially at high resolutions.
Bigger overhead for AAA games is likely due to emulation of DirectX or Vulkan on Metal, but that's just Apple's stubborn choice to have it that way.
In the end, none of that matters. I won't be playing Cyberpunk at 14fps, without RTX, and comforting myself that the SoC could do maybe 28fps without emulation. Lower-tier Nvidia cards perform better, even when paired with slower CPUs.
The major platforms do use the same graphics API, Vulkan. It should be preferred due to more low-level access and wider platform support (Linux, Android, Nintendo, MacOS, Windows).
On another note, problems that keep major AAA games from running on Linux (Anti-cheat solutions for example) will block many games from running ob MacOS, too.
The CPU is rarely a bottleneck for AAA games, so unless the x86 emulation is particularly terrible (Rosetta isn't) it shouldn't be the issue.
WINE on Linux is able to match the performance of games on Windows, so the DirectX translation layer shouldn't be a problem either.
So it's not unreasonable to assume that the M2 just doesn't have a GPU capable of running these games. And it's really not that surprising that an integrated GPU doesn't match the performance of a dedicated GPU.
PC game players tend to believe you can't play a game unless you bought the latest custom hardware for all of it and put all the settings on maximum.
Game developers are much more willing to run their work on lower end machines if they'll get paid for it, or at least they're more capable of tuning for it.
> So it's not unreasonable to assume that the M2 just doesn't have a GPU capable of running these games
Without including comparison data on native games? It's entirely unreasonable.
For instance, The native version of the DirectX 12 game "The Medium" was shown running side by side with the emulated version at WWDC, and the native version had double the frame rate.
> the M2 just doesn't have a GPU capable of running these games.
As long as AAA games are published on the Xbox Series S and shipping with graphics settings they will have no problem when running natively on an M2 chip.
>The M2 core is probably in the same ballpark as Zen 4 core, likely a tiny bit below.
The 7950x is running at 5.7Ghz when only a single thread is saturated. The M2 Ultra caps its cores at 3.5Ghz. A 62% higher clock speed, at a monster power profile, to barely beat it isn't evidence of a core advantage.
>24 M2 cores manage to beat 16 Zen 4 cores also at lower power
The M2 ultra has 16 real cores, with 8 additional efficiency cores that are very low performance. And of course the M2 Ultra could pretty handily trounce the 7950x because the latter has to dramatically scale back the clock speed, as the power profile of all 16 cores at 5.7Ghz would melt the chip. And of course the 7950x has hyper-threading and hardware for mini-versions of 16 more cores, so in a way it has more cores than the Apple chip.
>This doesn't account for the AVX512 of Zen4.
AVX512 is used by a tiny, minuscule fraction of computers ever in their history of existence. It is the most absolute non-factor going.
I mean...in an ideal world Apple would get the GPU off the core. It limits their core and power profile, and takes up a huge amount of die space. They could then individually mega-size the GPU and the CPU. They could investigate mega interconnects like nvidia's latest instead of trying to jam everything together.
Was Apple correct to call it the most powerful chip? Certainly not. And there is a huge price penalty. But they're hugely, ridiculously powerful machines that will never leave the user wanting.
It is true that nobody competes in the low power high efficiency workstation market or maybe such a market does not exist yet and Apple is creating it.
But also as users, some were expecting the M series are so good that they are going to take many markets by storm. And it seems it is not happening.
$300 midrange Nvidia card? Did you get stuck in 2010?
That's way below entry-level at this point. You're likely comparing it with a 1666 cards or something, which is based on a chip from 2012.
I wish Apple silicone was actually competitive on performance. Nvidia needs competition or they'll likely double prices again with the next generation.
> The M2 GPU is in the same league as a 300$ mid-range nVidia card
It still has the advantage of a much larger memory pool.
I did a quick comparison exercise - I priced two workstations with similar configurations, one from Dell, the other from Apple. While there are x86 (and ARM) machines that'll blow the biggest M2 out of the water, the prices, as far as Apple can go, aren't much different.
If you buy anything labeled as "workstation", you're paying twice the price already.
The article describes the M2 being blown out of the water by a 4080 and a 13900KS. That's about $2000 + RAM, motherboard, and power supply. Plus you can use the built in GPU in your CPU for acceleration things like transcodes.
You can get a pre-built gaming PC with a 4090 for about $4000, that'll crush the M2 in compute if you use any kind of GPU acceleration.
Of course the M2 has some other advantages (the unified memory and macOS) and some other disadvantages (you're stuck with the amount of RAM you pick at checkout, macOS, you have to sacrifice system RAM for GPU RAM) so it all depends on your use case.
I think the M2 still reigns supreme for mobile devices, though AMD is getting closer and closer with their mobile chips, but if you've got a machine hooked into the wall you'll have to pay some pretty excessive electricity rates for the M2 to become competitive.
> If you buy anything labeled as "workstation", you're paying twice the price already.
The price of workstation-class machines also includes the cost of higher build-quality and stability, things like same-day support and service - at least the option for a long-term (5-6 year) warranty, and FRUs - you don't get that with consumer-grade computers - and those things matter when a machine is something you depend on professionally.
What the poster means is that a "workstation" is designed with quickly swappable components, often not even needing to use any tools. Businesses may benefit from this.
While it doesn't necessarily mean the swappable components are standardized or easy to procure, they usually are. That's a separate item that "workstation" machines typically offer: longer availability of replacement parts.
I agree with your take. My plugged into the wall machine is a 128GB 13900k 4090 system. My mobile machine is an Apple Silicon Macbook Pro. There are some tasks that are still better on the unified memory of the Macbook, but only a handful. There are many tasks that are more pleasant on the Macbook because of the absurd power efficiency (DAW, Final Cut Pro).
Both machines have a quality that I appreciate: they are never, ever slow.
You’re forgetting the benefit of everything just working and never having to thinking about effing with drivers ever. To me, it’s priceless. Anything truly performance bound (CPU or GPU) is going to be done on HPC systems, not on a fake Windows “workstation”.
> If you buy anything labeled as "workstation", you're paying twice the price already.
We are not comparing MacPros to low-end desktops.
> You can get a pre-built gaming PC with a 4090 for about $4000, that'll crush the M2 in compute if you use any kind of GPU acceleration.
Yes, but the gaming PC will not as well built as the workstation-grade machine. And pretty much any GPU you can install on a gaming PC you can install on a MacPro - it's just that it won't be there out of the (Apple branded) box.
> you're stuck with the amount of RAM you pick at checkout
Sadly, this has been Apple for some time now - you buy the machine as it will be used for its whole intended lifetime. With the MacPro you can at least add internal storage and one or more GPU cards.
AFAIK the 2023 Mac Pro doesn't support PCIe GPUs for the same reason AS Macs don't support eGPUs. It has PCIe slots you can use for other things like capture cards or whatever but not GPUs.
RAM was something you could upgrade with the 2019 Mac Pro and something you could get a lot of. 1.5TB worth. The new Mac Pro caps out at 192GB which is barely better than consumer AMD/Intel systems at the moment.
I agree some MacPro users will be forced to move to workstation or server-grade PCs, but I am sure Apple knows that and they considered having integrated memory inconsequential for the majority of their users.
Also, remember, terabytes of RAM cost A LOT of money. The Dell I priced for comparison can go way higher than 192GB, but it’ll also cost you a lot more than 7K.
> It still has the advantage of a much larger memory pool.
I wonder if given roughly equal power to the GPUs in current gen consoles (PS5/XBSX), it'd yield some advantage in porting console games since those consoles also have a large shared pool of memory (16GB), and neither AMD nor Nvidia want to give up using VRAM as an upsell.
With the M2 Ultra prices, it'd be cheaper to buy a 4090 than to go the Apple route. With the M2 pro you'll probably still be better off with a 4080 unless you really need more than 16GB of VRAM.
I don't know the M2's efficiency for things like machine learning, but the M1's machine learning performance seemed to have been beaten 4-5x by the 3060Ti so I'm pretty sure "more VRAM" is all it's got going for it in ML tasks.
Well yeah, the market here would be people who already have a reasonably powerful Mac and would rather have that fill their gaming needs instead of having to build or buy a separate dedicated device for that.
But what I was really getting at is the trouble that game studios have been encountering lately when porting PS5 and Xbox titles to Windows, which is that these games are so reliant on those consoles' 16GB shared memory pool that they perform terribly on PCs. The impact is double, because not only are most GPUs in usage right now anemic when it comes to VRAM (even my last-gen high end 3080 Ti comes up short at only 12GB), traditional PCs also have to copy data between RAM and VRAM. Significant re-architecting for the Windows port is required to work around this.
M-series Macs are much more similar to current gen consoles with their shared memory pool, which in theory could make porting from console to Mac (at least when targeting Macs with 16GB+ of RAM) more straightforward than porting to Windows. While some work would need to be done to support Metal, the two most popular engines already do much of that legwork and the work that remains can be shared across multiple titles.
I can’t imagine using my work computer for gaming, as maintaining the software install has so many different requirements, but, then, I’m no PC gamer and would rather have a console plugged into the big TV in the living room than on my desktop monitors. It’s also much less of a hassle maintaining a console than a gaming PC.
As a side-note, my living room TV is a rather small 43 inch one (limited in size by the surrounding overflowing book shelves) but, if I were a gamer, I’d probably have gone with a 60+ inch or wall projector.
If I lived alone, I’d get an Apple Vision Pro instead of the humongous TV, as it’d be cheaper.
Cheaper in terms of money, but in terms of time? I have a hard time justifying anything that requires configuration and dicking around. I’m a grown-up and don’t have “free time”. I need things that just work. For me, that’s not Intel and Windows or Intel and Linux. It’s macOS, which is the only true workstation platform left.
My previous rig is approaching 6 years, and the only dead component is a cheap external USB drive. The rig was mining 24/7 when it wasn't used for development or gaming. You must be doing something very wrong.
Yes, it's not price segmentation, it's planned obsolescence.
The 3080 series would be fine for likely beyond the 50x0 series gpu-wise, but current games are already starting to stutter unless you downgrade textures because of its limited VRAM
The performance of the chip is matched to the memory size.
I think it’s a U shaped curve.
Beyond 80GB, today, the larger chip would maybe all of these: yield less, scale worse, take too much power, etc.
Like this matching of compute resources to RAM is partly the difference between CPUs and GPUs.
Anyway, it’s just to say that it isn’t a business decision. The extra RAM in the M2 doesn’t help the GPU much for the same tasks the H100 excels at, because it isn’t performant enough to use that RAM anywhere near the same way an H100 would, and if it were, there would have to be less RAM. The H100 doesn’t even have a graphics engine. It’s complicated.
> The performance of the chip is matched to the memory size.
That may be approximately true if you only look at a single generation of consumer graphics cards at a time. If you compare across generations or include non-gaming workloads the correlation falls apart.
What speed should we expect from the model on consumer hardware? I tried a 8 bit quantized version on 4090 and got it to generate 100 tokens for 13 second, which seems a bit slow to me.
One of main reasons somebody may want to use such a library is to constrain the output of a LLM. The language is designed to make this easy and abstract this part of the querying away. There are trivial cases when some value is coming from multiple choice, but one can also easily constrain one word to depend on a previously generated word.
The interesting part of that prediction is that depending on how you read it, you may say it failed embarrassingly, or you may say it predicted the current reality fairy well.
The next software as a differentiable thing that is the program has certainly failed. But now, there are amazing opportunities to connect text-to-text models to other another, to search engines, that it is likely to become a new programming.
Contrary to what many say here, this is a real threat and Google are right to consider to move fast in that area.
There are a few technical improvements needed that can come in the next months where ChatGPT may be tuned to rewrite queries for old-fashioned search engine to get better results. This may solve the problem of giving attribution for its answers, while keeping the AI capabilities.
But where the product threat is that this may show up in bing search pretty quickly.
That was RCT1, and they released a patch quite quickly... I mean, at max half a year after the game released, obviously.
Now that you mention that again, I wonder what the underlying issue here was. Nothing in the game depends on the real time clock, and more importantly, how does reloading the saved progress at an arbitrary point in the future break things? I do remember though that poking around the game files I couldn't figure out back then where the progress is stored at all. When I moved to a new machine, I reinstalled the game and then copied over everything from the old machine, basically replacing the entire game, and still all progress was lost. I could still load the individual save games though–which I always saved right after passing the scenario, so nothing happened. Luckily I was only 4 or 5 parks in, so from then on I made sure to save individual scenarios right before passing them, never after, so the win condition would trigger again after loading them.
I believe it was even more than one game that had this problem. If the data would be stored in a file, then any kid would copy and edit it. So, it was put in shady places in the registry and other windows configs.
That agrees with my intuition for the most part, although it might carve out some space for hope, in the sense that a paper about proof of A that doesn't mention !A means someone spent time in there and didn't find a bigger problem. The more times someone carves out a small amount of negative space in a problem domain, the higher the odds are that there's an asymptote that is nonzero.
What I'm more worried about is that we will prove that 'interesting text merges' are not all commutative. Though one trick we keep doing in spaces where things are proven impossible is that we cheat by creating tools that disallow or discourage the intractable scenarios. That could, for instance, lead to a generation of programming languages with a strange file structure that makes most workflows commutative.
That may sound strange or terrible to some people, but it's been my opinion for a number of years now that most of our coding conventions boil down to issues of avoiding merge conflicts, so a language that enforced a stronger form of conflict avoidance would be an 'opinionated' language but within the realm of the sorts of things we already put up with. Maybe no stranger than pure functional or Actor based languages, for instance.
We use notion and when I read this review, the irony is that its weakest point is the search. Just like with most of these tools. In the case of notion, it is not the results, but the fact that search is so slow.
Otherwise, the it is is super powerful. In a few cases, too powerful and having cases when one doesn't use all of its powers.
Notion is still in this space where it does a lot of things decently, and nothing well. It is a decent text editor. It builds decent data tables. It is a decent wiki. If it did at least one of those things we’ll, it would have a lot more long term staying power, and I think the strongest contender is text editing.
Notion will fall as a note taking app, but probably survive as an enteprise wiki. The search which is a big issue for now, will get improved, they just move slowly.
For what it's worth, we've cut p50, p90, p95 search latencies in half compared to December. There are more improvements in reliability and accuracy coming soon too.
Well, I disagree with pretty much everything in the claims.
First, most real unoptimised code faces many issues before memory bandwidth. During my PhD, the optimisation guys doing spiral.net sat nextdoor and they produced beautiful plots of what limits performance for a bunch of tasks and how each optimisation they do removes an upper bound line until last they get to some bandwidth limitation. Real code will likely have false IPC dependencies, memory latency problems due to pointer chaising or branch mispredictions well before memory bandwidth.
Then the database workload is something I would consider insanely optimized. Most engines are in fierce performance competition. And normally they hit the memory bandwidth in the end. This probably answers why the author is not comparing to EPYC instances that have the memory bandwidth to compete with Graviton.
Then the claims that they choose not to implement SMT or to use DDR5 are both coming from their upstream providers.
Wouldn't SMT be a feature that you are free to use when designing your own cores? I'm assuming Amazon has an architectural license (Annapurna acquisition probably had them, this team is likely the Graviton design team at AWS). So who is the upstream provider? ARM?
And if they designed the CPU wouldn't they decide which memory controller is appropriate? Seems like AWS should get as much credit for their CPUs as Apple gets for theirs.
Bottom line for Graviton is that a lot of AWS customers rely on open source software that already works well on ARM. And the AWS customer themselves often write their code in a language that will work just as well on ARM. So AWS can offer it's customers tremendous value with minimal transition pain. But sure, if you have a CPU-bound workload, it'll do better on EPYC or Xeon than Graviton.
This reminds me of the problems we did 15-20 ago at IOI or ACM ICPC. We did these in pure C then, sometimes C++.
I would have kept the states in a different way. Instead of making a vector/array of actors, I would make a pair of bitvectors the size of the grid. 1 is set if there is a blue (resp. red) actor at that position. No sorting is needed and it seems that for more practical puzzles this gives smaller state. All move operations are still easy to implement.
That would definitely work, and I’d be interested in the performance impact. This was written so that the state size would scale with the number of actors rather than the size of the grid. There is a degenerate case where a massive mostly empty grid becomes difficult not only to store in memory, but also to transition on move. The transition function would take time proportional to the size of the grid rather than the number of actors.
But here is what is visible:
The M2 core is probably in the same ballpark as Zen 4 core, likely a tiny bit below. That may become very tiny if Zen 4 core runs at lower frequency to equalize the power. This doesn't account for the AVX512 of Zen4.
24 M2 cores manage to beat 16 Zen 4 cores also at lower power, but these are different products. Zen 4 does scale to far more cores, 96 in an EPYC chip. AMD and Intel have far more investments in interconnects and multi-die chips to do these things.
The M2 GPU is in the same league as a 300$ mid-range nVidia card. It is not competitive at all - Apple produces the largest chip it can manufacture to go against a high margin smaller chip that nVidia orders.
Again all of this doesn't mean each product is not good on its own.