I think the higher memory is also a huge win, with support for up to 64gb.

PaulKeeble · on Oct 18, 2021

400GB/s available to the CPU cores in a unified memory, that is going to really help certain workloads that are very memory dominant on modern architectures. Both Intel and AMD are solving this with ever increasing L3 cache sizes but just using attached memory in a SOC has vastly higher memory bandwidth potential and probably better latency too especially on work that doesn't fit in ~32MB of L3 cache.

Unklejoe · on Oct 18, 2021

The M1 still uses DDR memory at the end of the day, it's just physically closer to the core. This is in contrast to L3 which is actual SRAM on the core.

The DDR being closer to the core may or may not allow the memory to run at higher speeds due to better signal integrity, but you can purchase DDR4-5333 today whereas the M1 uses 4266.

The real advantage is the M1 Max uses 8 channels, which is impressive considering that's as many as an AMD EPYC, but operates at like twice the speed at the same time.

sounds · on Oct 18, 2021

Just to underscore this, memory physically closer to the cores has improved tRAS times measured in nanoseconds. This has the secondary effect of boosting the performance of the last-level cache since it can fill lines on a cache miss much faster.

The step up from DDR4 to DDR5 will help fill cache misses that are predictable, but everybody uses a prefetcher already, the net effect of DDR5 is mostly just better efficiency.

The change Apple is making, moving the memory closer to the cores, improves unpredicted cache misses. That's significant.

dragontamer · on Oct 18, 2021

> Just to underscore this, memory physically closer to the cores has improved tRAS times measured in nanoseconds.

I doubt that tRAS timing is affected by how close / far a DRAM chip is from the core. Its just a RAS command after all: transfer data from DRAM to the sense-amplifiers.

If tRAS has improved, I'd be curious how it was done. Its one of those values that's basically been constant (on a nanosecond basis) for 20 years.

Most DDR3 / DDR4 improvements have been about breaking up the chip into more-and-more groups, so that Group#1 can be issued a RAS command, then Group#2 can be issued a separate RAS command. This doesn't lower latency, it just allows the memory subsystem to parallelize the requests (increasing bandwidth but not improving the actual command latency specifically).

kllrnohj · on Oct 18, 2021

The physically shorter wiring is doing basically nothing. That's not where any of the latency bottlenecks are for RAM. If it was physically on-die, like HBM, that'd be maybe different. But we're still talking regular LPDDR5 using off the shelf dram modules. The shorter wiring would potentially improve signal quality, but ground shields do that, too. And Apple isn't exceeding any specs on this (ie, it's not overclocked), so above average signal integrity isn't translating into any performance gains anyway.

wmf · on Oct 18, 2021

improved tRAS times

Has this been documented anywhere? What timings are Apple using?

GeekyBear · on Oct 18, 2021

Apple also uses massive cache sizes, compared to the industry.

They put a 32 megabyte system level cache in their latest phone chip.

>at 32MB, the new A15 dwarfs the competition’s implementations, such as the 3MB SLC on the Snapdragon 888 or the estimated 6-8MB SLC on the Exynos 2100

https://www.anandtech.com/show/16983/the-apple-a15-soc-perfo...

It will be interesting to see how big they go on these chips.

rektide · on Oct 18, 2021

> Apple also uses massive cache sizes, compared to the industry.

AMD's upcoming Ryzen are supposed to have 192MB L3 "v-cache" SRAM stacked above each chiplet. Current chiplets are 8-core. I'm not sure if this is a single chiplet but supposedly good for 2Tbps[1].

Slightly bigger chip than a iphone chip yes. :) But also wow a lot of cache. Having it stacked above rather than built in to the core is another game-changing move, since a) your core has more space b) you can 3D stack many layers of cache atop.

This has already been used on their GPUs, where the 6800 & 6900 have 128MB of L3 "Infinity cache" providing 1.66TBps. It's also largely how these cards get by with "only" 512GBps worth of GDDR6 feeding them (256bit/quad-channel... at 16GT). AMD's R9 Fury from spring 2015 had 1TBps of HBM2, for compare, albeit via that slow 4096bit wide interface.

Anyhow, I'm also in awe of the speed wins Apple got here from bringing RAM in close. Cache is a huge huge help. Plus 400GBps main memory is truly awesome, and it's neat that either the CPU or GPU can make use of it.

[1] https://www.anandtech.com/show/16725/amd-demonstrates-stacke...

dragontamer · on Oct 18, 2021

> The M1 still uses DDR memory at the end of the day, it's just physically closer to the core. This is in contrast to L3 which is actual SRAM on the core.

But they're probably using 8-channels of LPDDR5, if this 400GB/s number is to be believed. Which is far more memory channels / bandwidth than any normal chip released so far, EPYC and Skylake-server included.

duskwuff · on Oct 18, 2021

It's more comparable to the sort of memory bus you'd typically see on a GPU... which is exactly what you'd hope for on a system with high-end integrated graphics. :)

dragontamer · on Oct 18, 2021

You'd expect HBM or GDDR6 to be used. But this is seemingly LPDDR5 that's being used.

So its still quite unusual. Its like Apple decided to take commodity phone-RAM and just make many parallel channels of it... rather than using high-speed RAM to begin with.

HBM is specifically designed to be soldered near a CPU/GPU as well. For them to be soldering commodity LPDDR6 is kinda weird to me.

---------

We know it isn't HBM because HBM is 1024-bits at lower clock speeds. Apple is saying they have 512-bits across 8 channels (64-bits per channel), which is near LPDDR5 / DDR kind of numbers.

200GBps is within the realm of 1x HBM channel (1024-bit at low clock speeds), and 400GBps is 2x HBM channels (2048-bit bus at low clock speeds).

floatboth · on Oct 19, 2021

HBM isn't just "soldered near", it's connected through a silicon interposer rather than a PCB.

Also we know it's not HBM because the word "LPDDR5" was literally on the slides :)

> just make many parallel channels of it

isn't that just how LPDDR is in general? It has much narrower channels than DDR so you need much more of them?

dragontamer · on Oct 19, 2021

> isn't that just how LPDDR is in general? It has much narrower channels than DDR so you need much more of them?

Well yeah. But 400GBps is equivalent to 16x DDR4 channels. Its an absurdly huge amount of bandwidth.

kergonath · on Oct 18, 2021

> The DDR being closer to the core may or may not allow the memory to run at higher speeds due to better signal integrity, but you can purchase DDR4-5333 today whereas the M1 uses 4266.

My understanding is that bringing the RAM closer increases the bandwidth (better latency and larger buses), not necessarily the speed of the RAM dies. Also, if I am not mistaken, the RAM in the new M1s is LP-DDR5 (I read that, but it did not stay long on screen so I could be mistaken). Not sure how it is comparable with DDR4 DIMMs.

Unklejoe · on Oct 18, 2021

The overall bandwidth isn't affected much by the distance alone. Latency, yes, in the sense that the signal literally has to travel further, but that difference is miniscule (like 1/10th of a nanosecond) compared to overall DDR access latencies.

Better signal integrity could allow for larger busses, but I don't think this is actually a single 512 bit bus. I think it's multiple channels of smaller busses (32 or 64 bit). There's a big difference from an electrical design perspective (byte lane skew requirements are harder to meet when you have 64 of them). That said, I think multiple channels is better anyway.

The original M1 used LPDDR4 but I think the new ones use some form of DDR5.

rdw · on Oct 18, 2021

Your comment got me thinking, and I checked the math. It turns out that light takes ~0.2 ns to travel 2 inches. But the speed of signal propagation in copper is ~0.6 c, so that takes it up to 0.3 ns. So, still pretty small compared to the overall latencies (~13-18 ns for DDR5) but it's not negligible.

I do wonder if there are nonlinearities that come in to play when it comes to these bottlenecks. Yes, by moving the RAM closer it's only reducing the latency by 0.2 ns. But, it's also taking 1/3rd of the time that it used to, and maybe they can use that extra time to do 2 or 3 transactions instead. Latency and bandwidth are inversely related, after all!

jjoonathan · on Oct 18, 2021

Well, you can have high bandwidth and poor latency at the same time -- think ultra wide band radio burst from Earth to Mars -- but yeah, on a CPU with all the crazy co-optimized cache hierarchies and latency hiding it's difficult to see how changing one part of the system changes the whole. For instance, if you switched 16GB of DRAM for 4GB of SRAM, you could probably cut down the cache-miss latency a lot -- but do you care? If you cache hit rate is high enough, probably not. Then again, maybe chopping the worst case lets you move allocation away from L3 and L2 and into L1, which gets you a win again.

I suspect the only people who really know are the CPU manufacturer teams that run PIN/dynamorio traces against models -- and I also suspect that they are NDA'd through this life and the next and the only way we will ever know about the tradeoffs are when we see them pop up in actual designs years down the road.

jjoonathan · on Oct 18, 2021

DRAM latencies are pretty heinous. It makes me wonder if the memory industry will go through a similar transition to the storage industry's HDD->SSD sometime in the not too distant future.

I wonder about the practicalities of going to SRAM for main memory. I doubt silicon real estate would be the limiting factor (1T1C to 6T, isn't it?) and Apple charges a king's ransom for RAM anyway. Power might be a problem though. Does anyone have figures for SRAM power consumption on modern processes?

phkahler · on Oct 18, 2021

>> I wonder about the practicalities of going to SRAM for main memory. I doubt silicon real estate would be the limiting factor (1T1C to 6T, isn't it?) and Apple charges a king's ransom for RAM anyway. Power might be a problem though. Does anyone have figures for SRAM power consumption on modern processes?

I've been wondering about this for years. Assuming the difference is similar to the old days, I'd take 2-4GB of SRAM over 32GB of DRAM any day. Last time this came up people claimed SRAM power consumption would be prohibitive, but I have a hard time seeing that given these 50B transistor chips running at several GHz. Most of the transistors in an SRAM are not switching, so they should be optimized for leakage and they'd still be way faster than DRAM.

GeekyBear · on Oct 18, 2021

> The overall bandwidth isn't affected much by the distance alone.

Testing showed that the M1's performance cores had a surprising amount of memory bandwidth.

>One aspect we’ve never really had the opportunity to test is exactly how good Apple’s cores are in terms of memory bandwidth. Inside of the M1, the results are ground-breaking: A single Firestorm achieves memory reads up to around 58GB/s, with memory writes coming in at 33-36GB/s. Most importantly, memory copies land in at 60 to 62GB/s depending if you’re using scalar or vector instructions. The fact that a single Firestorm core can almost saturate the memory controllers is astounding and something we’ve never seen in a design before.

https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...

fomine3 · on Oct 19, 2021

It just said that bandwidth between a performance core and memory controller is great. It's not related to distance between memory controller and DRAM.

morei · on Oct 18, 2021

L3 is almost never SRAM, it's usually eDRAM and clocked significantly lower than L1 or L2.

(SRAM is prohibitively expensive to do at scale due to die area required).

Edit: Nope, I'm wrong. It's pretty much only Power that has this.

dragontamer · on Oct 18, 2021

As far as I'm aware, IBM is one of the few chip-designers who have eDRAM capabilities.

IBM has eDRAM on a number of chips in varying capacities, but... its difficult for me to think of Intel, AMD, Apple, ARM, or other chips that have eDRAM of any kind.

Intel had one: the eDRAM "Crystalwell" chip, but that is seemingly a one-off and never attempted again. Even then, this was a 2nd die that was "glued" onto the main chip, and not like IBM's truly eDRAM (embedded into the same process).

morei · on Oct 18, 2021

You're right. My bad. It's much less common than I'd thought. (Intel had it on a number of chips that included the Iron Pro Graphics across Haswell, Broadwell, Skylake etc)

dragontamer · on Oct 18, 2021

But only the Iris Pro 5200 (codename: Crystalwell) had eDRAM. All other Iris Pro were just normal DDR4.

EDIT: Oh, apparently there were smaller 64MB eDRAM on later chips, as you mentioned. Well, today I learned something.

epmaybe · on Oct 19, 2021

Ha, I still use an intel 5775c in my home server!

beebeepka · on Oct 18, 2021

I think the chip you are talking about is Broadwell.

dragontamer · on Oct 18, 2021

Broadwell was the CPU-core.

Crystalwell was the codename for the eDRAM that was grafted onto Broadwell. (EDIT: Apparently Haswell, but... yeah. Crystalwell + Haswell for eDRAM goodness)

Unklejoe · on Oct 18, 2021

L3 is SRAM on all AMD Ryzen chips that I'm aware of.

I think it's the same with Intel too except for that one 5th gen chip.

emsy · on Oct 18, 2021

Good point. Especially since a lot of software these days is not all that cache friendly. Realistically this means we have 2 years or so till further abstractions eat up the performance gains.

amelius · on Oct 18, 2021

> 400GB/s available to the CPU cores in a unified memory

It's not just throughput that counts, but latency. Any numbers to compare there?

wmf · on Oct 18, 2021

We'll have to wait for the AnandTech review but memory latency should be similar to Intel and AMD.

znwu · on Oct 18, 2021

I'm thinking with that much bandwidth, maybe they will roll out SVE2 with vlen=512/1024 for future M series.

AVX512 suffers from bandwidth on desktop. But now the bandwidth is just huge and SVE2 is naturally scalable. Sounds like free lunch?

bla3 · on Oct 18, 2021

I thought the memory was one of the more interesting bits here.

My 2-year-old Intel MBP has 64 GB, and 8 GB of additional memory on the GPU. True, on the M1 Max you don't have to copy back and forth between CPU and GPU thanks to integrated memory, but the new MBP still has less total memory than my 2-year-old Intel MBP.

And it seems they just barely managed to get to 64 GiB. The whole processor chip is surrounded by memory chips. That's in part why I'm curious to see how they'll scale this. One idea would be to just have several M1 Max SoCs on a board, but that's going to be interesting to program. And getting to 1 TB of memory seems infeasible too.

Gene_Parmesan · on Oct 18, 2021

Just some genuine honest curiosity here; how many workloads actually require 64gb of ram? For instance, I'm an amateur in the music production scene, and I know that sampling heavy work flows benefit from being able to load more audio clips fully into RAM rather than streaming them from disk. But 64g seems a tad overkill even for that.

I guess for me I would prefer an emphasis on speed/bandwidth rather than size, but I'm also aware there are workloads that I'm completely ignorant of.

jackjeff · on Oct 18, 2021

Can’t answer for music, but as a developer a sure way to waste a lot of RAM is to run a bunch of virtual machines, containers or device simulators.

I have 32GB, so unless I’m careless everything usually fits in memory without swapping. If you got over things get slow and you notice.

00deadbeef · on Oct 18, 2021

Same, I tend to get everything in 32GB but more and more often I'm going over that and having things slow down. I've also nuked an SSD in a 16GB MBP due to incredibly high swap activity. It would make no sense for me to buy another 32GB machine if I want it to last five years.

miohtama · on Oct 18, 2021

Don’t run Chrome and Slack at the same time :)

arvinsim · on Oct 19, 2021

So run Slack inside Chrome? :)

xcskier56 · on Oct 18, 2021

How do you track the swap activity? What would you call “high” swap activity?

discordance · on Oct 18, 2021

Open Activity Monitor, select Memory and there's "Swap used" down the bottom

secondcoming · on Oct 18, 2021

My laptop has 128GB for running several VMs that build C++ code, and Slack.

hatsubai · on Oct 18, 2021

Another anecdote from someone who is also in the music production scene - 32GB tended to be the "sweet spot" in my personal case for the longest time, but I'm finding myself hitting the limits more and more as I continue to add more orchestral tracks which span well over 100 tracks total in my workflows.

I'm finding I need to commit and print a lot of these. Logic's little checker in the upper right showing RAM, Disk IO, CPU, etc also show that it is getting close to memory limits on certain instruments with many layers.

So as someone who would be willing to dump $4k into a laptop where its main workload is only audio production, I would feel much safer going with 64GB knowing there's no real upgrade if I were to go with the 32GB model outside of buying a totally new machine.

Edit: And yes, there is does show the typical "fear of committing" issue that plagues all of us people making music. It's more of a "nice to have" than a necessity, but I would still consider it a wise investment. At least in my eyes. Everyone's workflow varies and others have different opinions on the matter.

kmeisthax · on Oct 18, 2021

I know the main reason why the Mac Pro has options for LRDIMMs for terabytes of RAM is specifically for audio production, where people are basically using their system memory as cache for their entire instrument library.

I have to wonder how Apple plans to replace the Mac Pro - the whole benefit of M1 is that gluing the memory to the chip (in a user-hostile way) provides significant performance benefits; but I don't see Apple actually engineering a 1TB+ RAM SKU or an Apple Silicon machine with socketed DRAM channels anytime soon.

sharikous · on Oct 19, 2021

I wonder about that too.

My bet is that they will get rid of the Mac Pro entirely. Too low ROI for them at this point.

My hope is to see an ARM workstation where all components are standard and serviceable.

I cannot believe we are in the era of glued batteries and soldered SSDs that are guaranteed to fail and take the whole machine with them.

halo37253 · on Oct 19, 2021

I think we'd probably see apple use the fast and slow ram method that old computers used back in the 90's.

16-32GB of RAM on the SOC, with DRAM sockets for usage past the built in amount.

Though by the time we see an ARM MacPro they might move to stacked DRAM on the SOC. But i'd really think two tier memory system would be apple's method of choice.

I'd also expect a dual SOC setup.

So I don't expect to see that anytime soon.

I'd love to get my hands on a Mac Mini with the M1 Max.

EricE · on Oct 20, 2021

I went for 64GB. I have one game where 32GB is on the ragged edge - so for the difference it just wasn't worth haggling over. Plus it doubled the memory bandwidth - nice bonus.

And unused RAM isn't wasted - the system will use it for caching. Frankly I see memory as one of the cheapest performance variables you can tweak in any system.

ellisv · on Oct 18, 2021

> how many workloads actually require 64gb of ram?

Don't worry, Chrome will eat that up in no time!

More seriously, I look forward to more RAM for some of the datasets I work with. At least so I don't have to close everything else while running those workloads.

FpUser · on Oct 18, 2021

I ran 512GB on my home server, 256GB on my desktop and 128GB on small factor desktop that I take with me to summer cottage.

Some of my projects work with big in memory databases. Add regular tasks and video processing on top and there you go.

halhen · on Oct 19, 2021

As a data scientist, I sometimes find myself going over 64 GB. Of course it all depends on how large data I'm working on. 128 GB RAM helps even with data of "just" 10-15 GB, since I can write quick exploratory transformation pipelines without having to think about keeping the number of copies down.

I could of course chop up the workload earlier, or use samples more often. Still, while not strictly necessary, I regularly find I get stuff done quicker and with less effort thanks to it.

AdrianB1 · on Oct 18, 2021

Not many, but there are a few that need even more. My team is running SQL servers on their laptops (development and support) and when that is not enough, we go to Threadrippers with 128-256GB of RAM. Other people run Virtual Machines on their computers (I work most of the time in a VM) and you can run several VMs at the same time, eating up RAM really fast.

dylan604 · on Oct 18, 2021

On a desktop Hackintosh, I started with 32GB that would die with out of memory errors when I was processing 16bit RAW images at full resolution. Because it was Hackintosh, I was able to upgrade to 64GB so the processing could complete. That was the only thing running.

jsjohnst · on Oct 19, 2021

What image dimensions? What app? I find this extremely suspect, but it’s plausible if you’ve way undersold what you’re doing. 24Mpixel 16bit RAW image would have no problem generally on an 4gb machine if it’s truly the only app running and the app isn’t shit. ;)

dylan604 · on Oct 19, 2021

I shoot timelapse using Canon 5D RAW images, I don't know the exact dimensions off the top of my head but greater than 5000px wide. I then grade them using various programs, ultimately using After Effects to render out full frame ProRes 4444. After Effects was running out of memory. It would crash and fail to render my file. It would display an error message that told me specifically it was out of memory. I increased the memory available to the system. The error goes away.

But I love the fact that you have this cute little theory to doubt my actual experience to infer that I would make this up.

jsjohnst · on Oct 19, 2021

> But I love the fact that you have this cute little theory to doubt my actual experience to infer that I would make this up.

The facts were suspect, your follow up is further proof I had good reason to be suspect. First off, the RAW images from a 5D aren’t 16 bit. ;) Importantly, the out of memory error had nothing to do with the “16 bit RAW files”, it was video rendering lots of high res images that was the issue which is a very different issue and of course lots of RAM is needed there. Anyway, notice I said “but it’s plausible if you’ve way undersold what you’re doing”, which is definitely the case here, so I’m not sure why it bothered you.

dylan604 · on Oct 20, 2021

Yes, Canon RAW images are 14bit. Once opened in After Effects, you are working in 16bit space. Are you just trying to be argumentative for the fun?

jsjohnst · on Oct 20, 2021

>> die with out of memory errors when I was processing 16bit RAW images

> Canon RAW images are 14bit

You don’t see the issue?

> Are you just trying to be argumentative for the fun?

In the beginning, I very politely asked a clarifying question making sure not to call you a liar as I was sure there was more to the story. You’re the one who’s been defensive and combative since, and honestly misrepresenting facts the entire time. Where you wrong at any point? Only slightly, but you left out so many details that were actually important to the story for anyone to get any value out of your anecdata. Thanks to my persistence, anyone who wanted to learn from your experience now can.

currency · on Oct 19, 2021

Not the person you're replying to.

>> I was processing 16bit RAW images at full resolution.

>> ...using After Effects to render out full frame ProRes 4444.

Those are two different applications to most of us. No one is accusing you of making things up, just that the first post wasn't fully descriptive of your use case.

mnw21cam · on Oct 19, 2021

Working with video will use up an extraordinary amount of memory.

Some of the genetics stuff I work on requires absolute gobs of RAM. I have a single process that requires around 400GB of RAM that I need to run quite regularly.

eyelidlessness · on Oct 19, 2021

I can exhaust my 64GB just opening browser tabs for documentation.

kaba0 · on Oct 19, 2021

In case your statement is only a slight sarcasm:

Isn’t that just the OS saying “unused memory is wasted memory”? Most of it is likely cache that can easily be evicted with higher memory pressure.

eyelidlessness · on Oct 20, 2021

It’s a slight exaggeration, I also have an editor open and some dev process (test runner usually). It’s not just caching, I routinely hit >30 GB swap with fans revved to the max and fairly often this becomes unstable enough to require a reboot even after manually closing as much as I can.

I mean, some of this comes down to poor executive function on my part, failing to manage resources I’m no longer using. But that’s also a valid use case for me and I’m much more effective at whatever I’m doing if I can defer it with a larger memory capacity.

kaba0 · on Oct 20, 2021

Which OS do you use? It’s definitely not a problem on your part, the OS should be managing it completely transparently.

kasabali · on Oct 19, 2021

It is the OS saying “unused memory is wasted memory”, and then every other application thinking they're OS and doing the same.

kaba0 · on Oct 19, 2021

Since applications have virtual memory, it sort of doesn’t matter? The OS will map these to actual pages based on how many processes are available, etc. So if only one app runs and it wants lots of memory, it makes sense to give it lots of memory - that is the most “economical” decision from both a energy and performance POV.

kasabali · on Oct 20, 2021

> So if only one app runs

You answered yourself.

sulam · on Oct 18, 2021

So, M1 has been out for a while now, with HN doom and gloom about not being able to put enough memory into them. Real world usage has demonstrated far less memory usage than people expected (I don't know why, maybe someone paid attention and can say). The result is that 32G is a LOT of memory for an M1-based laptop, and 64G is only needed for very specific workloads I would expect.

astrange · on Oct 19, 2021

Measuring memory usage is a complicated topic and just adding numbers up overestimates it pretty badly. The different priorities of memory are something like 1. wired (must be in RAM) 2. dirty (can be swapped) 3. purgeable (can be deleted and recomputed) 4. file backed dirty (can be written to disk) 4. file backed clean (can be read back in).

Also note M1's unified memory model is actually worse for memory use not better. Details left as an exercise for the reader.

simonh · on Oct 19, 2021

Unified memory is a performance/utilisation tradeoff. I think the thing is it's more of an issue with lower memory specs. The fact you don't have 4GB (or even 2 GB) dedicated memory on a graphics card in a machine with 8GB of main memory is a much bigger deal than not having 8GB on the graphics card on a machine with 64 GB of main RAM.

kmonsen · on Oct 19, 2021

Or like games, even semi-casual ones. Civ6 would not load at all on my mac mini. Also had to fairly frequently close browser windows as I ran out of memory.

concinds · on Oct 19, 2021

I couldn't load Civ6 until I verified game files in Steam, and now it works pretty perfectly. I'm on 8GB and always have Chrome, Apple Music and OmniFocus running alongside.

kmonsen · on Oct 24, 2021

Huh, thank you I will try this.

fotta · on Oct 18, 2021

I'm interested to see how the GPU on these performs, I pretty much disable the dGPU on my i9 MBP because it bogs my machine down. So for me it's essentially the same amount of memory.

derefr · on Oct 18, 2021

> but the new MBP still has less total memory

From the perspective of your GPU, that 64GB of main memory attached to your CPU is almost as slow to fetch from as if it were memory on a separate NUMA node, or even pages swapped to an NVMe disk. It may as well not be considered "memory" at all. It's effectively a secondary storage tier.

Which means that you can't really do "GPU things" (e.g. working with hugely detailed models where it's the model itself, not the textures, that take up the space) as if you had 64GB of memory. You can maybe break apart the problem, but maybe not; it all depends on the workload. (For example, you can't really run a Tensorflow model on a GPU with less memory than the model size. Making it work would be like trying to distribute a graph-database routing query across nodes — constant back-and-forth that multiplies the runtime exponentially. Even though each step is parallelizable, on the whole it's the opposite of an embarrassingly-parallel problem.)

Vomzor · on Oct 18, 2021

That's not how M1's unified memory works.

>The SoC has access to 16GB of unified memory. This uses 4266 MT/s LPDDR4X SDRAM (synchronous DRAM) and is mounted with the SoC using a system-in-package (SiP) design. A SoC is built from a single semiconductor die whereas a SiP connects two or more semiconductor dies. SDRAM operations are synchronised to the SoC processing clock speed. Apple describes the SDRAM as a single pool of high-bandwidth, low-latency memory, allowing apps to share data between the CPU, GPU, and Neural Engine efficiently. In other words, this memory is shared between the three different compute engines and their cores. The three don't have their own individual memory resources, which would need data moved into them. This would happen when, for example, an app executing in the CPU needs graphics processing – meaning the GPU swings into action, using data in its memory. https://www.theregister.com/2020/11/19/apple_m1_high_bandwid...

These Macs are gonna be machine learning beasts.

derefr · on Oct 19, 2021

I know; I was talking about the computer the person I was replying to already owns.

The GP said that they already essentially have 64GB+8GB of memory in their Intel MBP; but they don't, because it's not unified, and so the GPU can't access the 64GB. So they can only load 8GB-wide models.

Whereas with the M1 Pro/Max the GPU can access the 64GB, and so can load 64GB-wide models.

Vomzor · on Oct 26, 2021

It seems I misunderstood.

andrekandre · on Oct 18, 2021

so whats the implication of this?

that apples specific use cases for the m1 series is basically "prosumer" ?

(sorry if i'm just repeating something obvious)

londons_explore · on Oct 18, 2021

Memory is very stackable if needed, since the power per unit area is very low.

mlindner · on Oct 18, 2021

How much of that 64 GB is in use at the same time though? Caching not recently used stuff from DRAM out to an SSD isn't actually that slow, especially with the high speed SSD that Apple uses.

RantyDave · on Oct 19, 2021

Right. And to me, this is the interesting part. There's always been that size/speed tradeoff ... by putting huge amounts of memory bandwidth on "less" main RAM, it becomes almost half-ram-half-cache; and by making the SSD fast it becomes more like massive big half-hd-half-cache. It does wear them out, however.

gamacodre · on Oct 18, 2021

Why 1TB? 640GB ought to be enough for anything...

gamacodre · on Oct 18, 2021

Huh, I guess that was as bad an idea as the 640K one.

saijanai · on Oct 18, 2021

How much per 8K x 10 bit color, video frame?

Roughly 190GB per minute without sound.

Trying to do special effects on more than a few seconds of 8K video would overwhelm a 64GB system, I suspect.

gamacodre · on Oct 18, 2021

You were (unintentionally) trolled. My first post up there was alluding to the legend that Bill Gates once said, speaking of the original IBM PC, "640K of memory should be enough for anybody." (N.B. He didn't[0])

[0] https://www.wired.com/1997/01/did-gates-really-say-640k-is-e...

jrk · on Oct 18, 2021

Video and VFX generally don't need to keep whole sequences in RAM persistently these days because:

1. The high-end SSDs in all Macs can keep up with that data rate (3GB/sec) 2. Real-time video work is virtually always performed on compressed (even losslessly compressed) streams, so the data rate to stream is less than that.

ssijak · on Oct 18, 2021

And NVMe with 7.5gbps are like, we are almost not even note worthy haha Impressive all around.

jeswin · on Oct 18, 2021

It's not that noteworthy, given that affordable Samsung 980 Pro SSDs have been doing those speeds for well over a year now.

nojito · on Oct 18, 2021

980 pro maxes at 7.

jeswin · on Oct 18, 2021

But it's also been around for at least a year. And upcoming pcie 5 SSDs will up that to 10-14GBps.

I'm saying Apple might have wanted to emphasise their more standout achievements. Such as on the CPU front, where they're likely to be well ahead for a year - competition won't catch up until AMD starts shipping 5nm Zen4 CPUs in Q3/Q4 2022.

nojito · on Oct 18, 2021

Apple has a well over 5 year advantage when compared to their competition.

cozzyd · on Oct 19, 2021

That is very difficult to believe, short of sabotage.

dispat0r · on Oct 19, 2021

Apple has a node advantage.

lkbm · on Oct 18, 2021

I'm guessing that's new for the 13" or for the M1, but my 16‑inch MacBook Pro purchased last year had 64GB of memory. (Looks like it's considered a 2019 model, despite being purchased in September 2020).

jack_riminton · on Oct 18, 2021

I don't think this is an apples to apples comparison because of how the new unified memory works

NovaS1X · on Oct 19, 2021

Well technically it is an Apple to Apple comparison in his case.

lostlogin · on Oct 19, 2021

It all falls apart when the apple contains something non-apple.

fotta · on Oct 18, 2021

Right the Intels supported 64gb, but the 16gb limitation on the M1 was literally the only thing holding me back from upgrading.

christkv · on Oct 18, 2021

And the much higher memory bandwidth