At the risk of being off-topic, an honest stupid EE question:
What physical innovations actually drive the processor releases, or even flash storage sizes? I understand that each generation shrinks the transistors. But what tangible things were invented in 2015 to make the transistors smaller? Is it that in order to make them smaller, Intel has to run some kind of circuit optimization calculations that weren't possible the year before? Or are there advances in manufacturing (e.g. physical inventions of new molding processes, new chemical processes, something else?) Or is it simply market forces driving the prices and the specs?
For the most part, manufacturing advances drive microarchitecture advances. Smaller feature sizes mean more transistors can be stuffed in the same area. Those transistors can be used to make larger reorder buffers, more registers, more caches, better branch predictors, and more functional units. If you want to know about specific architectural changes in x86 over the years, I strongly recommend Cliff Click's talk: A Crash Course in Modern Hardware.[1]
A lot of the specifics of semiconductor manufacturing are closely-guarded secrets, but Todd Fernandez gave a glimpse in an informal talk titled Inseparable From Magic: Manufacturing Modern Computer Chips.[2]
Intel has (or at least tries to have) a 2 year cycle: in year one, you get a new chip design. In year two, you get the same chip design shrunk down to the latest process size. This gives the design team 2 years to figure out improvements and keeps the process engineers from killing themselves.
When technology doesn't keep up with marketing plans, or when it becomes too expensive to build the next process improvement, you get a stutter, and AMD gets a chance to catch up.
I don't think AMD is really dumb enough to actually want to catch up on the desktop. It seems like they are doing a certain amount of marketing lip service to desktop hardware, but their really interesting products are APUs/GPUs for ML and HPC, and their real ambitions are there and for parts of the server market that Intel isn't pursuing as aggressively.
Anyway, for anyone who doesn't already own a fab and have the market cap of BP, trying to compete with Intel on desktop parts is like starting a cattle ranch in your back yard to try to undercut Cargill Meat Solutions at selling leather to buggy whip manufacturers.
Disclaimer: because my university is nearby ASML, I get bombarded by their information... also I'm not an EE student
The actual progress of technology is in lithography. The lithography machines are produced by ASML and used by almost all major chip producers[1]. Currently ASML is building their new line of machines, namely with extreme ultraviolet lithography instead of what the line of current machines use.
What I do not know, if the machines are currently used for Intels 14nm or that the old machines are still used (or maybe both). What I do know, is that ASML and Intel are having problems with getting to 10nm. That is because they have to improve the current machines, you observe this from the delay in the Tick-Tock from Intel.
Which brings us to the Tick-Tock of Intel, this is the improvement of the chip structure itself. They do this because they have to wait for the smaller transistors.
At last, what I heard is that the ASML machine is getting more and more complicated. Especially because they are having troubles with the laws of nature.
To add to this: it's not just developing new lithography processes such as EUL. A large part of the recent improvements are due to optimization of existing litography processes by improving sensor accuracy, minimizing the error, and improving the throughput speed among others.
Advances in manufacturing. Each shrink will involve a large number of patented improvements to the etch process, layer alignment, chemistry (e.g. "high k metal gate"), sputtering, reduced-kerf diamond wire saws for the wafers themselves, and so on. You could probably find everything in Intel's patents.
That's how I understood it as well, though I was still disappointed that my expectations weren't exceeded. I am eagerly awaiting the day that new AAA games are playable at low quality settings on a MacBook Pro but I guess that's still a few years away.
AVX-512 looks kind of nuts: 32 registers of 64 bytes each, so 2KB of just registers, and apparently gate area to do eight 64-bit multiplies/divides at once. Also adds masking (only run this xor instruction on these three of the eight 64-bit words) and other stuff.
One other interesting feature for the long term, apparently in regular Skylake too, is a bounds-checking assist (MPX): instructions and registers and address-lookup hardware to make bounds checks cheaper. (The bounds check instructions are effectively NOPs on older hardware, I think.) I don't know what the economics of supporting it are, but I like anything that might lead to more code deployed with more safety belts.
Finally, I wonder when Skylake server is coming out. The process delays threw off their usual tick-tock rhythm; I wonder if it means large Skylake server chips will come out with less than the usual delay after the top of the Broadwell server line (which isn't out yet), or, less likely, if Intel will skip large Broadwell Xeons entirely.
AVX-512 was the only interesting part of the microarchitecture, and honestly the only reason I would bother to spend the money to buy one. I imagine it will greatly reduce the adoption rate of the AVX-512 features and capabilities. Disappointing.
Despite all the talk about Moore's law and all that, what this really means is that user-experience has gone all the way down to silicon. Raw performance just isn't important to most people these days. They want a small, light, just-works, good power consumption, doesn't have annoying hickups or laggyness experience.
Intel's trying to build the chips that can support this and not trying to enable regular consumers to fold proteins or factor primes.
Needs have plateau-d too. Remember when transcoding a CD in mp3 was an hour thing that may fail ? Most of the mainstream needs have been addressed in term of compute power. What's missing, as you said, is global "perf", low Wattage, smooth UI, simple and fast enough peripherals. We'd love easy printers but Intel is only Intel.
Heck, even my 6 year old spare desktop can transcode highdef video on the fly. That used to be the stuff of dreams.
I think the implications are clear, it's time for software developers to start thinking about performance if their work is slow, you can't just expect consumers to go out and throw more hardware at it.
Things are messy nowadays (web, native, crossplatform, gpgpu, opencl) and people can't apply their brain to performance as it was in the old days when there was simply no other choice.
I bet in a few years a lot of cruft will vanish. OpenGL removed a big part of it for clarity. The web and native will gather (see how dynamic languages are jit compiled, and take low level into account). Hopefully overall efficiency will improve, for builders and customers.
People's happiness have started to shift with size and battery. People are willing to even get something a bit slower now if it means half the size (new MacBook, MacBook Air and other small laptops based on Intel's Core M Broadwell).
I realize now that these skylakes are not really about enthusiasts.
Think about what will make your parent's computer feel more powerful. A more fluid GUI, higher resolution in windows, a higher framerate in windows and of course peripherals going faster. Skylake has much better onboard graphics (which take up half the die) and a lot more IO for USB 3.1, SSD storage and display port from integrated graphics.
If Intel wanted to, they could double the cores by taking the integrated graphics out. They could double the peak flops again by including the 512bit SIMD instructions. They've already demonstrated they can do it.
They know their real money making market and it isn't anyone here.
1. According to Anandtech [1], Intel is appealing to enthousiasts/gamers:
To go with the launch is a new look of Intel's Core processor packaging, in part to appeal to the gaming crowd. As the gaming industry is considered one of the few remaining areas for potentially large growth in the PC industry, Intel is increasing its focus on gaming as a result.
2. The Skylake-K processors launched today are the unlocked versions geared towards overclocking, again targeting enthousiasts. Skylake-S won't be out until later this year.
This makes it all the more perplexing that they decided to ``use so much die space'' for integrated graphics given their target audience.
Maybe they are trying to market the first batch to enthusiasts but ultimately designed it for the average home user? Who knows. What they have right at this moment is awkward, but I have a difficult time believing that Intel doesn't have a good idea of what they are doing.
Also interesting is that they found a decrease in gaming performance at 3.0ghz vs Haswell. Now, if Skylake could be expected to maintain higher frequencies than Devil's Canyon chips then the slight (3-5%) decrease could be overcome - but a Devil's Canyon i7 (the 4790k) has the same 4.0ghz base frequency as the new i7-6700K, and what looks like the same turbo frequencies.
Looks like at base clock speeds it's a ~50% increase vs a 2600K. However, I have a ~20% OC so I really need to see how much the 6700K OC's. Still it's seems unlikely to be worth spending ~750$ for a new CPU, RAM, Cooler, and MB.
I have a 2600k at 4.5 GHz (probably can go faster, since I haven't put much effort into overclocking). For almost everything I do, it's fine, but some games in Dolphin still push the limits. It's really annoying when something runs perfectly well most of of the time, but has occasional slight speed drops.
25% faster clock-for-clock would probably fix this in most of the games I care about, and that might be enough to get me to upgrade.
It's truly disturbing how long Sandy Bridge has spent near the top of the CPU price/performance ranks, given how old it is. It's as if Intel just stopped trying back in 2010 or so.
I guess everything interesting has to run on a GPU nowadays.
I have an i5-2500K at home and an i5-3570K at work, neither is overclocked and neither is what you could call slow, moving to SSD was a much bigger upgrade.
In all my years of building PC's I can't remember a time when a nearly 5 year old machine would still run all everything I wanted to run, on the one hand that's cool on the other barring some shakeup it looks like the days of must have upgrades every year seem to be halting (for now).
Same. I was waiting for Skylake to upgrade but doesn't seem worth it; I'll probably wait for a 2x performance boost on the i5-2500k, which seems to be still a long way off. Although it seems disappointing perhaps I should be grateful I got lucky and chose the right generation last time I upgraded.
Well, Haswell does have 2x integer and 4x FPU performance. A lot of potential in AVX/AVX2 is still untapped for now. It sure doesn't help only few bother upgrading...
Then again, I'm often memory bandwidth bound even when using just SSE. If the memory can't deliver enough data to work with, there's bound to be a performance ceiling.
Be aware though that that 4x FPU number is very misleading, because it only applies if every instruction you are doing is a vectorized fma (fuse multiply add). AVX is massively untapped, but Intel's peak flops numbers based on SIMD fma are only true in the most technical sense.
Also, it is unlikely that you are actually memory bandwidth bound unless you are looping through linear memory on every core. It is fairly tricky to make useful software that is not memory latency bound.
I know that. But I also know FMA is what you're almost always doing with a FPU, in terms of consumed clock cycles.
So I don't really think it's that misleading. It's the go-to instruction.
> Also, it is unlikely that you are actually memory bandwidth bound unless you are looping through linear memory on every core.
Well, that's what Intel vtune says. If I'm processing stuff at total bandwidth of 30 GB/s, I tend to believe it. Each core is dual issue, so with SSE you can process up to 32 bytes per cycle per core. When you have 4 cores at 3.5 GHz, that adds up. The CPU beast can consume an order of magnitude more data than DRAM can supply, even with just SSE. So you can do a lot of operations on the data and still be memory bound.
> I know that. But I also know FMA is what you're almost always doing with a FPU, in terms of consumed clock cycles.
That is not a true generalization at all. Maybe you are doing image compositing, but fma instructions are not so heavily used that they can be thought of as the single workhorse of a processor.
> but fma instructions are not so heavily used that they can be thought of as the single workhorse of a processor.
Well, indeed, a lot of cycles are lost for moving the data around, branching, waiting for memory. But when it comes to floating point computation, well, FMA is really common. In floating point inner loops it's just so usual to need to do "x := x + a * b".
I didn't say single workhorse of computer, but for DSP type of stuff FMA rocks. Dot product, matmul, even fft -- a lot of FPU heavy computation directly benefits from FMA.
Anyways, even without FMA, you can do a lot of ops per value and still be memory bound.
It seems like we're significantly into diminishing returns with single core IPC, so AMD should largely catch up over the next decade (assuming the status quo doesn't change). In this metric, a chip 90% as fast as state of the art might be 1/10 the cost to develop by then. Or even less.
But even if Intel isn't seeing much progress in terms of single core performance, they're beating the crap out of AMD in terms of performance per watt, and by extension, absolute performance of the high end multi core parts.
Taking a look at the absolute insanity that are the 18 core Haswell-EP parts, and comparing them to the 8 module Piledriver parts, and the die size of each (i.e. the cost to make them), the gap is widening, not shrinking.
I think everyone here has known for a while that the free lunch is over with respect to writing single core programs and speeding up their execution on future hardware. That's going to take a while to filter down to building programs that take advantage of large amounts of parallelism, but it's going to happen. And when it does, consumer CPUs are going to look more like server parts.
With the programs of today, you'd rather a 4.0ghz quad core. But with the programs of tomorrow, a 2.0ghz 32 core will run circles around it. Amdahl's law is ever present, but we are very, very far from reaching the limits of what is possible.
If it gave a proportional speedup for everyday computing, I believe Intel would already have an 8 core mainstream part, and maybe even higher. Right now they can heavily price discriminate against the few that care about the bump to 6-8 slower cores.
As an embedded developer, I'm pretty surprised to see good old SPI show up on the block diagram, connected to the chipset.
I knew I2C was already in PCs for sensors and whatnot, and SPI has the advantage that it can be driven at higher speeds, but I actually expected something homegrown from Intel.
The days of Moore's law are long dead. Intel's transistors haven't been getting better since 5-6 years now. I wonder what direction the semiconductor industry will go in. These are very interesting times.
Moore's law is about transistor density, not individual thread speed. Transistor density continues to increase. Nvidia's Maxwell is supposed to have over 8 billion transistors on 28nm.
Dr Chris Mack's argument is that Moore's law is about price per transistor. That did not decrease in Intel's last process step, and thus they declared Moore's law dead already in 2014.
In 2012 Nvidia released Kepler, which had 7 billion transistors, also at 28nm. 3 years to increase transistor counts by 12.5%... Isn't that evidence that Moore's law is in fact dead?
If it were not dead, that should be 28 billion transistors in 2015. Instead you're talking about 8.
What physical innovations actually drive the processor releases, or even flash storage sizes? I understand that each generation shrinks the transistors. But what tangible things were invented in 2015 to make the transistors smaller? Is it that in order to make them smaller, Intel has to run some kind of circuit optimization calculations that weren't possible the year before? Or are there advances in manufacturing (e.g. physical inventions of new molding processes, new chemical processes, something else?) Or is it simply market forces driving the prices and the specs?