Not just that. At 5nm there will also be yield problems. I.e they will put the b...

hajile · on Nov 10, 2020

This is undoubtedly why they launched the Mac Mini today. They can ramp up a lot more power in that machine without a battery and with a larger, active cooler.

I'm much more interested in actual benchmarks. AMD has mostly capped their APU performance because DDR4 just can't keep the GPU fed (why the last 2 generations of consoles went with very wide GDDR5/6). Their solution is obviously Infinity Cache where they add a bunch of cache on-die to reduce the need to go off-chip. At just 16B transistors, Apple obviously didn't do this (at 6 transistors per SRAM cell, there's around 3.2B transistors in just 64MB of cache).

ineedasername · on Nov 10, 2020

Okay, probably a stupid question, but solid state memory can be pretty dense: why don't we have huge caches, like a 1GB cache? As I understand it, cache memory doesn't put off heat like the computational part of the chip does, so heat dissipation probably wouldn't increase much with a larger chip package.

emn13 · on Nov 11, 2020

Nand flash is pretty dense, but way too slow. Sram is fast but not at all dense, needing 6 transistors per bit.

For reference: https://en.m.wikipedia.org/wiki/Transistor_count lists all the largest cpu as of 2019 amd's epyc Rome at 39.54 billion MOSFETs, so even if you replaced the entire chip with Sram you wouldn't even quite reach 1GB!

Dram would be enticing, but the details matter.

chemmail · on Nov 11, 2020

Nand is nonvolatile and the tradeoff with that is write cycles. We have an inbetween in the form of 3D Xpoint (Optane), Intel is still trying to figure out the best way to use it. It currently like an L6 cache after system DRAM.

thu2111 · on Nov 11, 2020

Well not just Intel. Optane is a new point in the memory hierarchy. That has a lot of implications for how software is designed, it's not something Intel can do all by itself.

atq2119 · on Nov 11, 2020

SRAM is 6 transistors per bit, so you're taking about 48 billion transistors there, and that's ignoring the overhead of all the circuits around the cells themselves.

DRAM is denser, but difficult to build on the same process as logic.

That said, with chiplets and package integration becoming more common, who knows... One die of DRAM as large cache combined with a logic die may start to make more sense. It's certainly something people have tried before, it just didn't really catch on.

Pet_Ant · on Nov 11, 2020

> DRAM is denser, but difficult to build on the same process as logic.

What makes it so difficult to have on the same chip?

elihu · on Nov 11, 2020

I don't know the details, but the manufacturing process is pretty different. Trying to have one process that's good at both DRAM and logic at the same time is hard, because they optimize for different things.

janekm · on Nov 11, 2020

Or, well, how about putting all your ram on the package as Apple says they are doing with the M1?

coryrc · on Nov 10, 2020

Cost. Area is what you pay for (at a given transistor size).

gameswithgo · on Nov 11, 2020

the bigger the cache the slower it gets

so ram is your 1gb cache

tomxor · on Nov 11, 2020

Are you are referring to latency due to propagation delay where the worst case increases as you scale?

Would you mind elaborating a bit? I'm not following how this would significantly close the gap between SRAM and DRAM at 1GB. Since an SRAM cell itself is generally faster than a DRAM cell, and I understand that circuitry beyond an SRAM cell itself is far simpler than DRAM. Am I missing something?

chongli · on Nov 11, 2020

Think of a circular library with a central atrium and bookshelves arranged in circles radiating out from the atrium. In the middle of the atrium you have your circular desk. You can put books on your desk to save yourself the trouble of having to go get them off the shelves. You can also move books to shelves that are closer to the atrium so they're quicker to get than the ones farther away.

So what's the problem? Well, your desk is the fastest place you can get books from but you clearly can't make your desk the size of the entire library, as that would defeat the purpose. You also can't move all of the books to the innermost ring of shelves, since they won't fit. The closer you are to the central atrium, the smaller the bookshelves. Conversely, the farther away, the larger the bookshelves.

Circuits don't follow this ideal model of concentric rings, but I think it's a nice rough approximation for what's happening here. It's a problem of geometry, not a problem of physics, and so the limitation is even more fundamental than the laws of physics. You could improve things by going to 3 dimensions, but then you would have to think about how to navigate a spherical library, and so the analogy gets stretched a bit.

Scaevolus · on Nov 11, 2020

Area is a big one. Why isn't L1 MB? Because you can't put that much data close enough to the core.

Look at a Zen-based EPYC core- 32KB of L1 with 4 cycle latency, 512KB of L2 with 12 cycle latency, 8MB of L3 with 37 cycle latency.

L1 to L2 is 3x slower for 8x more memory, L2 to L3 is 3x slower for 16x more memory.

You can reach 9x more area in 3x more cycles, so you can see how the cache scaling is basically quadratic (there's a lot more execution machinery competing for area with L1/L2, so it's not exact).

https://www.7-cpu.com/cpu/Zen.html

MobiusHorizons · on Nov 11, 2020

I am sure there are many factors, but the most basic one is that the more memory you have, the longer it takes to address that memory. I think it scales with the log of the ram size, which is linearly with the number of address bits.

psykotic · on Nov 11, 2020

Log-depth circuits are a useful abstraction but the constraints of laying out circuits in physical space imposes a delay scaling limit of O(n^(1/2)) for planar circuits (with a bounded number of layers) and O(n^(1/3)) for 3D circuits. The problem should be familiar to anyone who's drawn a binary tree on paper.

Dylan16807 · on Nov 11, 2020

With densities so high, and circuit boards so small (when they want to be), that factor isn't very important here.

We regularly use chips with an L3 latency around 10 nanoseconds, going distances of about 1.5 centimeters. You can only blame a small fraction of a nanosecond on the propagation delays there. And let's say we wanted to expand sideways, with only a 1 or 2 nanosecond budget for propagation delays. With a relatively pessimistic assumption of signals going half the speed of light, that's a diameter of 15cm or 30cm to fit our SRAM into. That's enormous.

semi-extrinsic · on Nov 11, 2020

Well, the latest AMD Epyc has 256 MB L3 cache, so we're getting there.

gameswithgo · on Nov 11, 2020

but any given core only has access to 16 soon to be 32mb

fulafel · on Nov 11, 2020

in Zen 1 and Zen 2, cores have direct or indirect access to the shared L3 cache in the same CCX. In the cross-CCX case the neighboring CCX cache can be accessed over the in-package interconnect without going through system DRAM.

samoa42 · on Nov 11, 2020

this!

when i started with computers, they had a few KB of L2 cache, L3 did not exist. Main Memory was a few MB.

rbanffy · on Nov 10, 2020

With the DRAM this close, it can probably be very fast. Did they say anything about bus or bandwidth?

saidajigumi · on Nov 11, 2020

AnandTech speculates on a 128-bit DRAM bus[1], but AFAIK Apple hasn't revealed rich details about the memory architecture. It'll be interesting to see what the overall memory bandwidth story looks like as hardware trickles out.

[1] https://www.anandtech.com/show/16226/apple-silicon-m1-a14-de...

rbanffy · on Nov 11, 2020

Apple being Apple, we won't know much before someone grinds down a couple chips to reveal what the interconnections are, but if you are feeding GPUs along with CPUs, a wider memory bus between DRAM and the SoC cache makes a lot of sense.

xbar · on Nov 11, 2020

I am looking forward to compile time benchmarks for Chromium. I think this chip and SoC architecture may make the Air a truly fantastic dev box for larger projects.

arnarbi · on Nov 10, 2020

That’s what “binned differently” means btw.

birdyrooster · on Nov 10, 2020

Interesting that commenter knew the process but not the terminology.

Frost1x · on Nov 10, 2020

As someone who works with a lot of interdisciplinary teams, I often understand concepts or processes they have names for but don't know the names until after they label them for me.

Until you use some concept so frequently you need to label it to compress information for discussion purposes, you often don't have names for them. Chances are if you solve or attempt to solve a wide variety problems, you'll see patterns and processes that overlap.

onecommentman · on Nov 10, 2020

Seconded.

It’s often valuable to use jargon from another discipline in discussions. It sort of kicks discussions out of ruts. Many different disciplines use different terminology for similar basic principles. How those other disciplines extend these principles may lead to entirely different approaches and major (orders of magnitude) improvements. I’ve done it myself a few times.

On another note, the issue of “jargon” as an impediment to communication has led the US Military culture to develop the idea of “terms of art”. The areas of responsibility of a senior officer are so broad that they enter into practically every professional discipline. The officer has to know when they hear an unfamiliar term that they are being thrown off by terminology rather than lack of understanding. Hence the phrase “terms of art”. It flags everyone that this is the way these other professionals describe this, so don’t get thrown or feel dumb.

No one expects the officer to use (although they could) a “term of art”, but rather to understand and address the underlying principle.

It’s also a way to grease the skids of discussion ahead of time. “No, General, we won’t think you’re dumb if you don’t use the jargon, but what do you think of the underlying idea...”

Might be a good phrase to use in other professional cultures. In particular in IT, because of the recursion of the phrase “term of art” being itself be a term of art until it’s generally accepted. GNU and all that...

y7F85OGwp · on Nov 11, 2020

Where can I learn more about the US Military culture's "terms of art" idea?

selestify · on Nov 11, 2020

> How those other disciplines extend these principles may lead to entirely different approaches and major (orders of magnitude) improvements.

Fascinating. Would you happen to have any example off the top of your head?

tinkertamper · on Nov 14, 2020

But then how will developers exert their local domain dominance to imply and an even greater breadth of knowledge when patronizing other devs? /s

inimino · on Nov 11, 2020

I have always assumed this term was in widespread and general use in the US business culture. Is that not the case?

thisrod · on Nov 10, 2020

This gets even more fun when several communities discover the same thing independently, and each comes up with a different name for it.

My favorite is the idea of "let's expand functions over a set of Gaussians". That is variously known as a Gabor wavelet frame, a coherent state basis [sic], an Gaussian wave packet expansion, and no doubt some others I haven't found. Worse still, the people who use each term don't know about any of the work down by people who use the other terms.

Insanity · on Nov 10, 2020

Reminds me of the Feynman story about knowing something vs knowing the name of something :-)

cgriswald · on Nov 10, 2020

Reminds me of self-taught tech. I’ll often know the name/acronym, but pronounce it differently in my head than the majority of people. Decades ago GUI was “gee you eye” in my head but one day I heard it pronounced “gooey” and I figured it out but had a brief second of “hwat?” (I could also see “guy” or “gwee”.) It’s, of course, more embarrassing when I say it out loud first...

Xixi · on Nov 11, 2020

First time I went to a Python conference in SV, more than a decade ago, I kept hearing "Pie-thon" everywhere, and had no idea what the hell people were talking about.

I took me a solid half hour to at last understand this pie-thingy was Python... in my head I had always pronounced it the French way. Somewhat like "pee-ton", I don't know how to transcribe that "on" nasal sound... (googling "python prononciation en francais" should yield a soundtrack for the curious non-French speakers).

prepend · on Nov 11, 2020

I thought numpy was pronounced like it rhymes with bumpy for a year or so.

morganvachon · on Nov 11, 2020

Picture 18 year old me in 1995, I got a 486SX laptop as a graduation present out of the blue from my estranged father. I wanted to add an external CD-ROM to it so I could play games and load software for college, and it had a SCSI port. I went to the local computer store and asked the guy for a "ess see ess eye" CD-ROM drive, he busted out laughing and said "oh you mean a scuzzy drive?" Very embarrassing for me at the time but that's when I learned that computer acronyms have a preferred pronunciation so I should try to learn them myself to avoid future confusion.

ddalex · on Nov 12, 2020

> Very embarrassing for me at the time

it shouldn't be, it should be a badge of honor of some sorts - it points to somebody reading to expand their knowledge that is not available in oral form around them, so kudos to them !

hawski · on Nov 12, 2020

It's even more visible in non-English speaking countries. In Poland: first everyone says Java as Yava and after a while they start to switch to a proper English pronunciation. Many times it divides amateurs from professionals, but I wouldn't really know, because I don't work with Java.

cellularmitosis · on Nov 10, 2020

Great story :) https://v.cx/2010/04/feynman-brazil-education

Insanity · on Nov 10, 2020

Not the one I was thinking of but same point :) https://fs.blog/2015/01/richard-feynman-knowing-something/

jcynix · on Nov 10, 2020

Great story, yes. But there's no such thing as a "halzenfugel" in German as far as I can tell as a native speaker. Even www.duden.de, the official German dictionary, doesn't know that word ;-0

chromaton · on Nov 11, 2020

That's OK, AFAICT there's no bird called the "brown throated thrush" either.

TurkTurkleton · on Nov 11, 2020

As a native English speaker and middling foreign-language speaker of German, "halzenfugel" sounds to me like a mock-German word that an English speaker would make up.

marta_morena_28 · on Nov 11, 2020

Hah, good to know. However, unless you are talking to people from the same domain, it's usually a better approach to spell out things instead of relying on terminology. Concepts and ideas translate much better across domains than terminology.

arnarbi · on Nov 11, 2020

I think a bunch of people learnt a new thing from your comment, so it is a good one.

I hope my reply didn’t come out as gatekeeping, it was genuinely just to help put a name to a thing.

read_if_gay_ · on Nov 10, 2020

May just have skimmed GP and missed it.

dmix · on Nov 10, 2020

Well then that was a good explanation because I didn’t know that!

jacquesm · on Nov 10, 2020

That's now how yield works. Yield is the number of functioning chips that you pull out of a wafer.

I think what you are trying to refer to is frequency binning.

hajile · on Nov 10, 2020

That's only partially true.

For example, AMD sells 12 and 16 core CPUs. The 12 core parts have 2 cores lasered out due to defects. If a particular node is low-yield, then it's not super uncommon to double-up on some parts of the chip and use either the non-defective or best performing one. You'll expect to see a combination of lasering and binning to adjust yields higher.

That said, TSMC N5 has a very good defect rate according to their slides on the subject[0]

[0] https://www.anandtech.com/show/16028/better-yield-on-5nm-tha...

mbauman · on Nov 10, 2020

Which is likely why there are some "7 core" GPU M1 chips.

raihansaputra · on Nov 11, 2020

Yep for the MBA. I think for devs that can live with 16GB, the cheaper 7GPU MacBook Air is very interesting instead of the MacBook Pro for $300 cheaper.

bee_rider · on Nov 10, 2020

Plus, defects tend to be clustered, which is a pretty lucky effect. Multiple defects on a single core don't really matter if you are throwing the whole thing away.

gmadsen · on Nov 11, 2020

Is that not what the parent comment said? I thought "binning" referred to this exact process.

tobr · on Nov 10, 2020

Isn’t that what “binning” means?