Intel’s Skylake Core i7-6700K reviewed: Modest gains from a full Tick-Tock cycle

lpsz · on Aug 5, 2015

At the risk of being off-topic, an honest stupid EE question:

What physical innovations actually drive the processor releases, or even flash storage sizes? I understand that each generation shrinks the transistors. But what tangible things were invented in 2015 to make the transistors smaller? Is it that in order to make them smaller, Intel has to run some kind of circuit optimization calculations that weren't possible the year before? Or are there advances in manufacturing (e.g. physical inventions of new molding processes, new chemical processes, something else?) Or is it simply market forces driving the prices and the specs?

ggreer · on Aug 5, 2015

For the most part, manufacturing advances drive microarchitecture advances. Smaller feature sizes mean more transistors can be stuffed in the same area. Those transistors can be used to make larger reorder buffers, more registers, more caches, better branch predictors, and more functional units. If you want to know about specific architectural changes in x86 over the years, I strongly recommend Cliff Click's talk: A Crash Course in Modern Hardware.[1]

A lot of the specifics of semiconductor manufacturing are closely-guarded secrets, but Todd Fernandez gave a glimpse in an informal talk titled Inseparable From Magic: Manufacturing Modern Computer Chips.[2]

1. http://www.infoq.com/presentations/click-crash-course-modern... (starts about 4 minutes in)

2. https://www.youtube.com/watch?v=NGFhc8R_uO4

iso-8859-1 · on Aug 5, 2015

Check of the VLSI Circuits conference proceedings: http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000798

Here are the papers Intel presented at the 2014 conference: http://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?filter%3D...

dsr_ · on Aug 5, 2015

Intel has (or at least tries to have) a 2 year cycle: in year one, you get a new chip design. In year two, you get the same chip design shrunk down to the latest process size. This gives the design team 2 years to figure out improvements and keeps the process engineers from killing themselves.

When technology doesn't keep up with marketing plans, or when it becomes too expensive to build the next process improvement, you get a stutter, and AMD gets a chance to catch up.

elipsey · on Aug 5, 2015

I don't think AMD is really dumb enough to actually want to catch up on the desktop. It seems like they are doing a certain amount of marketing lip service to desktop hardware, but their really interesting products are APUs/GPUs for ML and HPC, and their real ambitions are there and for parts of the server market that Intel isn't pursuing as aggressively.

Last I heard Zen is supposed to release as a server part in 2016, which I take to mean that it would be a miracle if I could buy a mainstream desktop part in two years (http://www.fudzilla.com/news/graphics/37713-amd-has-finfet-t...).

Anyway, for anyone who doesn't already own a fab and have the market cap of BP, trying to compete with Intel on desktop parts is like starting a cattle ranch in your back yard to try to undercut Cargill Meat Solutions at selling leather to buggy whip manufacturers.

whazor · on Aug 5, 2015

Disclaimer: because my university is nearby ASML, I get bombarded by their information... also I'm not an EE student

The actual progress of technology is in lithography. The lithography machines are produced by ASML and used by almost all major chip producers[1]. Currently ASML is building their new line of machines, namely with extreme ultraviolet lithography instead of what the line of current machines use.

What I do not know, if the machines are currently used for Intels 14nm or that the old machines are still used (or maybe both). What I do know, is that ASML and Intel are having problems with getting to 10nm. That is because they have to improve the current machines, you observe this from the delay in the Tick-Tock from Intel.

Which brings us to the Tick-Tock of Intel, this is the improvement of the chip structure itself. They do this because they have to wait for the smaller transistors.

At last, what I heard is that the ASML machine is getting more and more complicated. Especially because they are having troubles with the laws of nature.

[1] http://www.wsj.com/articles/SB100014241278873238093045784288...

lsorber · on Aug 6, 2015

To add to this: it's not just developing new lithography processes such as EUL. A large part of the recent improvements are due to optimization of existing litography processes by improving sensor accuracy, minimizing the error, and improving the throughput speed among others.

sukruh · on Aug 6, 2015

Intel isn't implementing EUV for 10nm. They plan to use it for 7nm, though. http://www.anandtech.com/show/9447/intel-10nm-and-kaby-lake

pjc50 · on Aug 5, 2015

Advances in manufacturing. Each shrink will involve a large number of patented improvements to the etch process, layer alignment, chemistry (e.g. "high k metal gate"), sputtering, reduced-kerf diamond wire saws for the wafers themselves, and so on. You could probably find everything in Intel's patents.

iolothebard · on Aug 5, 2015

Skylake was more about mobile/watts from what I understood. Desktop has been stagnating since Sandy Bridge for the most part.

clw8 · on Aug 5, 2015

That's how I understood it as well, though I was still disappointed that my expectations weren't exceeded. I am eagerly awaiting the day that new AAA games are playable at low quality settings on a MacBook Pro but I guess that's still a few years away.

vardump · on Aug 5, 2015

For my definition of playable, for example Bioshock Infinite is perfectly playable on 2015 Macbook retina 13".

But personally it doesn't really matter. I don't have time to play anyways. Nor really so much desire either nowadays.

_stephan · on Aug 5, 2015

It's a pity that the new AVX-512 SIMD instructions will only be available on Xeon Skylake chips.

twotwotwo · on Aug 5, 2015

AVX-512 looks kind of nuts: 32 registers of 64 bytes each, so 2KB of just registers, and apparently gate area to do eight 64-bit multiplies/divides at once. Also adds masking (only run this xor instruction on these three of the eight 64-bit words) and other stuff.

One other interesting feature for the long term, apparently in regular Skylake too, is a bounds-checking assist (MPX): instructions and registers and address-lookup hardware to make bounds checks cheaper. (The bounds check instructions are effectively NOPs on older hardware, I think.) I don't know what the economics of supporting it are, but I like anything that might lead to more code deployed with more safety belts.

Finally, I wonder when Skylake server is coming out. The process delays threw off their usual tick-tock rhythm; I wonder if it means large Skylake server chips will come out with less than the usual delay after the top of the Broadwell server line (which isn't out yet), or, less likely, if Intel will skip large Broadwell Xeons entirely.

jandrewrogers · on Aug 5, 2015

AVX-512 was the only interesting part of the microarchitecture, and honestly the only reason I would bother to spend the money to buy one. I imagine it will greatly reduce the adoption rate of the AVX-512 features and capabilities. Disappointing.

bane · on Aug 5, 2015

Despite all the talk about Moore's law and all that, what this really means is that user-experience has gone all the way down to silicon. Raw performance just isn't important to most people these days. They want a small, light, just-works, good power consumption, doesn't have annoying hickups or laggyness experience.

Intel's trying to build the chips that can support this and not trying to enable regular consumers to fold proteins or factor primes.

agumonkey · on Aug 5, 2015

Needs have plateau-d too. Remember when transcoding a CD in mp3 was an hour thing that may fail ? Most of the mainstream needs have been addressed in term of compute power. What's missing, as you said, is global "perf", low Wattage, smooth UI, simple and fast enough peripherals. We'd love easy printers but Intel is only Intel.

bane · on Aug 5, 2015

Heck, even my 6 year old spare desktop can transcode highdef video on the fly. That used to be the stuff of dreams.

I think the implications are clear, it's time for software developers to start thinking about performance if their work is slow, you can't just expect consumers to go out and throw more hardware at it.

agumonkey · on Aug 5, 2015

Yep, everythins is amazing yet nobody's happy.

Things are messy nowadays (web, native, crossplatform, gpgpu, opencl) and people can't apply their brain to performance as it was in the old days when there was simply no other choice.

I bet in a few years a lot of cruft will vanish. OpenGL removed a big part of it for clarity. The web and native will gather (see how dynamic languages are jit compiled, and take low level into account). Hopefully overall efficiency will improve, for builders and customers.

SG- · on Aug 5, 2015

People's happiness have started to shift with size and battery. People are willing to even get something a bit slower now if it means half the size (new MacBook, MacBook Air and other small laptops based on Intel's Core M Broadwell).

agumonkey · on Aug 5, 2015

I have yet to use a laptop not from before 2009. I fail to see how even Core M are significantly slower.

CyberDildonics · on Aug 5, 2015

I realize now that these skylakes are not really about enthusiasts.

Think about what will make your parent's computer feel more powerful. A more fluid GUI, higher resolution in windows, a higher framerate in windows and of course peripherals going faster. Skylake has much better onboard graphics (which take up half the die) and a lot more IO for USB 3.1, SSD storage and display port from integrated graphics.

If Intel wanted to, they could double the cores by taking the integrated graphics out. They could double the peak flops again by including the 512bit SIMD instructions. They've already demonstrated they can do it.

They know their real money making market and it isn't anyone here.

lsorber · on Aug 5, 2015

Two counterarguments to your realization:

1. According to Anandtech [1], Intel is appealing to enthousiasts/gamers:

To go with the launch is a new look of Intel's Core processor packaging, in part to appeal to the gaming crowd. As the gaming industry is considered one of the few remaining areas for potentially large growth in the PC industry, Intel is increasing its focus on gaming as a result.

2. The Skylake-K processors launched today are the unlocked versions geared towards overclocking, again targeting enthousiasts. Skylake-S won't be out until later this year.

This makes it all the more perplexing that they decided to ``use so much die space'' for integrated graphics given their target audience.

[1] http://www.anandtech.com/show/9483/intel-skylake-review-6700...

CyberDildonics · on Aug 6, 2015

Maybe they are trying to market the first batch to enthusiasts but ultimately designed it for the average home user? Who knows. What they have right at this moment is awkward, but I have a difficult time believing that Intel doesn't have a good idea of what they are doing.

iso-8859-1 · on Aug 5, 2015

You're making it sound like they do not make enterprise solutions at all.

Retric · on Aug 5, 2015

Ouch, it might be worth it to upgrade a 4 year old CPU.

Considering I actually have a four year old 2600K @4Ghz, it would be nice if they included an actual comparison.

DuskStar · on Aug 5, 2015

AnandTech's review has a much better set of comparisons to previous generation processors. (and if that isn't enough, there's always Bench)

http://anandtech.com/show/9483/intel-skylake-review-6700k-66...

Also interesting is that they found a decrease in gaming performance at 3.0ghz vs Haswell. Now, if Skylake could be expected to maintain higher frequencies than Devil's Canyon chips then the slight (3-5%) decrease could be overcome - but a Devil's Canyon i7 (the 4790k) has the same 4.0ghz base frequency as the new i7-6700K, and what looks like the same turbo frequencies.

Retric · on Aug 5, 2015

Looks like at base clock speeds it's a ~50% increase vs a 2600K. However, I have a ~20% OC so I really need to see how much the 6700K OC's. Still it's seems unlikely to be worth spending ~750$ for a new CPU, RAM, Cooler, and MB.

Skunkleton · on Aug 5, 2015

According to Anandtech, it is only a 35% increase relative to the base clocked 2600k :/. Also, DDR4 isn't yet a real improvement over DDR3.

mmebane · on Aug 5, 2015

I have a 2600k at 4.5 GHz (probably can go faster, since I haven't put much effort into overclocking). For almost everything I do, it's fine, but some games in Dolphin still push the limits. It's really annoying when something runs perfectly well most of of the time, but has occasional slight speed drops.

25% faster clock-for-clock would probably fix this in most of the games I care about, and that might be enough to get me to upgrade.

pervycreeper · on Aug 5, 2015

Indeed. Would also be nice to see a direct comparison between a typical OC 2600k (say 4.5 GHz) vs. their sample at max OC.

CamperBob2 · on Aug 5, 2015

It's truly disturbing how long Sandy Bridge has spent near the top of the CPU price/performance ranks, given how old it is. It's as if Intel just stopped trying back in 2010 or so.

I guess everything interesting has to run on a GPU nowadays.

noir_lord · on Aug 5, 2015

I have an i5-2500K at home and an i5-3570K at work, neither is overclocked and neither is what you could call slow, moving to SSD was a much bigger upgrade.

In all my years of building PC's I can't remember a time when a nearly 5 year old machine would still run all everything I wanted to run, on the one hand that's cool on the other barring some shakeup it looks like the days of must have upgrades every year seem to be halting (for now).

RobinL · on Aug 5, 2015

Same. I was waiting for Skylake to upgrade but doesn't seem worth it; I'll probably wait for a 2x performance boost on the i5-2500k, which seems to be still a long way off. Although it seems disappointing perhaps I should be grateful I got lucky and chose the right generation last time I upgraded.

vardump · on Aug 5, 2015

Well, Haswell does have 2x integer and 4x FPU performance. A lot of potential in AVX/AVX2 is still untapped for now. It sure doesn't help only few bother upgrading...

Then again, I'm often memory bandwidth bound even when using just SSE. If the memory can't deliver enough data to work with, there's bound to be a performance ceiling.

CyberDildonics · on Aug 5, 2015

Be aware though that that 4x FPU number is very misleading, because it only applies if every instruction you are doing is a vectorized fma (fuse multiply add). AVX is massively untapped, but Intel's peak flops numbers based on SIMD fma are only true in the most technical sense.

Also, it is unlikely that you are actually memory bandwidth bound unless you are looping through linear memory on every core. It is fairly tricky to make useful software that is not memory latency bound.

vardump · on Aug 5, 2015

I know that. But I also know FMA is what you're almost always doing with a FPU, in terms of consumed clock cycles.

So I don't really think it's that misleading. It's the go-to instruction.

> Also, it is unlikely that you are actually memory bandwidth bound unless you are looping through linear memory on every core.

Well, that's what Intel vtune says. If I'm processing stuff at total bandwidth of 30 GB/s, I tend to believe it. Each core is dual issue, so with SSE you can process up to 32 bytes per cycle per core. When you have 4 cores at 3.5 GHz, that adds up. The CPU beast can consume an order of magnitude more data than DRAM can supply, even with just SSE. So you can do a lot of operations on the data and still be memory bound.

CyberDildonics · on Aug 5, 2015

> I know that. But I also know FMA is what you're almost always doing with a FPU, in terms of consumed clock cycles.

That is not a true generalization at all. Maybe you are doing image compositing, but fma instructions are not so heavily used that they can be thought of as the single workhorse of a processor.

vardump · on Aug 5, 2015

> but fma instructions are not so heavily used that they can be thought of as the single workhorse of a processor.

Well, indeed, a lot of cycles are lost for moving the data around, branching, waiting for memory. But when it comes to floating point computation, well, FMA is really common. In floating point inner loops it's just so usual to need to do "x := x + a * b".

I didn't say single workhorse of computer, but for DSP type of stuff FMA rocks. Dot product, matmul, even fft -- a lot of FPU heavy computation directly benefits from FMA.

Anyways, even without FMA, you can do a lot of ops per value and still be memory bound.

mafuyu · on Aug 5, 2015

@asanagi: You're shadow-banned as of 15 days ago.

_stephan · on Aug 5, 2015

Not any more?

jkot · on Aug 5, 2015

I have the same situation, not worth the upgrade. Maybe in 10 years there will be better CPU.

If you want upgrade, buy watercooling and overclock it.

asanagi · on Aug 5, 2015

You may find this page to be useful:

http://www.pcper.com/reviews/Processors/Intel-Core-i7-6700K-...

arielweisberg · on Aug 5, 2015

Is this going to create an opportunity for AMD to catch up? Would be great for CPUs to no longer be a one horse race even if only on the desktop.

reitzensteinm · on Aug 5, 2015

It seems like we're significantly into diminishing returns with single core IPC, so AMD should largely catch up over the next decade (assuming the status quo doesn't change). In this metric, a chip 90% as fast as state of the art might be 1/10 the cost to develop by then. Or even less.

But even if Intel isn't seeing much progress in terms of single core performance, they're beating the crap out of AMD in terms of performance per watt, and by extension, absolute performance of the high end multi core parts.

Taking a look at the absolute insanity that are the 18 core Haswell-EP parts, and comparing them to the 8 module Piledriver parts, and the die size of each (i.e. the cost to make them), the gap is widening, not shrinking.

I think everyone here has known for a while that the free lunch is over with respect to writing single core programs and speeding up their execution on future hardware. That's going to take a while to filter down to building programs that take advantage of large amounts of parallelism, but it's going to happen. And when it does, consumer CPUs are going to look more like server parts.

With the programs of today, you'd rather a 4.0ghz quad core. But with the programs of tomorrow, a 2.0ghz 32 core will run circles around it. Amdahl's law is ever present, but we are very, very far from reaching the limits of what is possible.

If it gave a proportional speedup for everyday computing, I believe Intel would already have an 8 core mainstream part, and maybe even higher. Right now they can heavily price discriminate against the few that care about the bump to 6-8 slower cores.

SG- · on Aug 5, 2015

No, they're so far behind in the CPU race.

AceJohnny2 · on Aug 5, 2015

As an embedded developer, I'm pretty surprised to see good old SPI show up on the block diagram, connected to the chipset.

I knew I2C was already in PCs for sensors and whatnot, and SPI has the advantage that it can be driven at higher speeds, but I actually expected something homegrown from Intel.

typon · on Aug 5, 2015

The days of Moore's law are long dead. Intel's transistors haven't been getting better since 5-6 years now. I wonder what direction the semiconductor industry will go in. These are very interesting times.

CyberDildonics · on Aug 5, 2015

Moore's law is about transistor density, not individual thread speed. Transistor density continues to increase. Nvidia's Maxwell is supposed to have over 8 billion transistors on 28nm.

Gravityloss · on Aug 5, 2015

Dr Chris Mack's argument is that Moore's law is about price per transistor. That did not decrease in Intel's last process step, and thus they declared Moore's law dead already in 2014.

https://www.youtube.com/watch?v=IBrEx-FINEI

rgbrenner · on Aug 5, 2015

In 2012 Nvidia released Kepler, which had 7 billion transistors, also at 28nm. 3 years to increase transistor counts by 12.5%... Isn't that evidence that Moore's law is in fact dead?

If it were not dead, that should be 28 billion transistors in 2015. Instead you're talking about 8.

CyberDildonics · on Aug 5, 2015

1. 14nm, which obviously is a working technology now would mean 4 times the transistors.

2. The Kepler you are talking about was an outlier used in Tesla boards, it is not indicative of the general trend at all.

3. Two cherry picked data points don't mean a trend that has been sustained for 40 years is dead.