New MIPS64-based Loongson processors break performance barrier

rasz_pl · on Sept 3, 2015

Just the other day someone (chinese characters name) with loongson.cn email started dropping Mplayer patches(fixes,optimisations) on official mplayer mailing list.

I dont remember ever seeing that out of Chinese SoC vendors. They usually ninja patch internally and ship half working garbage binary (Im looking at you Rockchip cocks) to selected favourite vendors.

kryptiskt · on Sept 3, 2015

> They usually ninja patch internally and ship half working garbage binary (Im looking at you Rockchip cocks) to selected favourite vendors.

And if you actually do get their source, you will likely also get your own private hell in trying to replicate their Windows XP-based build environment.

mmcco · on Sept 3, 2015

Ingenic, a Chinese SoC CPU vendor, worked with ImgTec to add MXU (their MIPS32 SIMD extension) to MPlayer in an open source fork. However, they seem to insist that you reverse engineer their quirky instruction encodings from a confusing awk script. The only alternative is to actually run all of your source files through the awk script during builds...

I started adding them to the LLVM but lost interest. The instructions are available on the Creator CI20 and the GCW Zero, as well as some new Chinese market tablets from Philips.

cgh · on Sept 3, 2015

This was the big problem with the Linksys fiasco some years back. The Chinese developers wouldn't give up the source and all Cisco/Linksys had was this binary.

creshal · on Sept 3, 2015

Interesting benchmark results: http://blog.imgtec.com/wp-content/uploads/2015/09/64-bit-CPU...

• Both MIPS and ARM have surpassed AMD in the low-power SoC area (the only where AMD is even remotely competitive right now), and are comparable to their desktop/server products

• A (2? 3?) years old Intel architecture still beats all three by a wide margin. And the recent Skylakes are again some 10% more efficient.

Someone · on Sept 3, 2015

Uninteresting, I would say. Give me performance/Watt, performance/dollar or performance, period, not performance/GHz, or delve into details explaining what makes this this CPU do more per cycle.

For example, this CPU runs at 1.5GHz, the Cortex A57 at around 2Ghz. There goes quite a bit of the difference in speed.

alexvoica · on Sept 3, 2015

I think I address some of your points in the article. I specify that overall peak power consumption for the chip is 30W - this is for an octa-core configuration. This means one CPU roughly consumes 3.5W (if you take out the coherency manager and fabric). Then I also referenced the SPEC CPU2000 performance number at 1GHz. From that you can easily calculate performance per W. If you want more technical details about the architecture, they do have an user manual here http://www.loongson.cn/uploadfile/cpumanual/Loongson3B1500_p...

I also state that a new processor will be released next year and it will run at above 2GHz which should solve the difference in frequency.

low_battery · on Sept 3, 2015

AMD will be releasing a new architecture in 2016 and will use 14nm process. Hopefully Jim Keller's team will close the gap.

https://en.wikipedia.org/wiki/Zen_(microarchitecture)

rasz_pl · on Sept 3, 2015

and their new Fury GPU will be fastest on the market!!1 enabling smooth 4K gameplay .... oh wait

AMD is great on paper, almost never delivers. Every time they managed to deliver something good (K6, K7, R700) it was sudden and without much PR, you could even say it was a total surprise. If anything AMD has a long history of sitting quiet when they actually have the goods, and producing nice slide decks before usual flops.

low_battery · on Sept 3, 2015

Why? I think their R9 nano cards are quite good. Looks like they are rushed to market, but still gives a run to gigantic nvidia titan cards a run for its money.

I remind you that It was AMD who created 64 bit x86. This is why it is originally called (and still sometimes rightfully called) AMD64. I mentioned Jim Keller because he is the co author of x86-64 instruction set specification and lead architect of K8 (opteron) processors. He went back to AMD and working on the new Architecture since 2012. It is true that last 8 years was mostly disastrous for AMD, but I wouldn't count them off so fast, especially if they can ramp up on 14nm with a high IPC design. Time will tell.

hga · on Sept 3, 2015

Doesn't deny his thesis, though? K7 out-of-order was great when Intel went on their P4 "Marketecture" boondoggle, AMD64 was great, K8 + Hypertransport plus on chip memory controllers blew Intel out of the water in pretty much every way, IA-64 couldn't compete.

But they took their eyes off the ball, bought ATI, took 4 and a half years to update their microarchtecture, and the first shipped K10h chips had a bad TLB, requiring them to stop shipping until the next stepping fixed that 6 months later.

I really liked them in that heyday, and, oh, back in the 486 DX4 days, but given their many decades losing corporate DNA (I'm sure it's still true they haven't in the long run made money for their shareholders), and their having to sell their foundries, it's doubtful they'll again be a big player.

alexvoica · on Sept 3, 2015

I think a lot of people forget that 64-bit MIPS has been around for more than two decades - much longer than Intel, AMD or ARM.

For example, the Nintendo N64 game console had a MIPS64 CPU.

hga · on Sept 3, 2015

It was probably overshadowed in most people's minds by the DEC Alpha, which was also introduced in 1992.

SixSigma · on Sept 4, 2015

But what's your point ? People don't buy CPUs based on the age of the brand.

alexvoica · on Sept 4, 2015

I'm not talking about buying, I'm talking about engineering expertise.

makomk · on Sept 3, 2015

AMD blew Intel out of the water in every possible way, except sales. They had trouble selling chips to mainstream OEMs because the OEMs had questionably-legal deals with Intel that penalized them for buying processors from anyone else. In some cases AMD literally couldn't give them away. And in the long run that killed them - without the income they simply couldn't keep up.

hga · on Sept 3, 2015

Note, when I say "blew Intel out of the water", I'm talking only about AMD64 K8.

As I understand it, this was true for "Big Iron" AMD64 sales, for one thing, AMD had a whole year's head start where they were the only ones selling it, and then Intel played catch up for a long time. For multi-chip machines, not until Intel had their own version of the Hypertransport + local memory controller architecture.

I believe what you're talking about only happened in consumer grade chips, although I know HP for instance found a way to sell AMD based consumer systems, my parents bought one in 2004-5.

rwg · on Sept 4, 2015

The ~5 years between AMD's release of the Athlon 64 and Intel's Nehalem were truly AMD's glory days on the server. We had two clusters, one with pairs of 2.2 GHz Opteron 248 CPUs in its compute nodes and another with pairs of 3.4 GHz "Nocona" Xeons in its compute nodes. The Opteron nodes completely wiped the floor with the Xeon nodes in everything we threw at them, despite the Xeons enjoying a >50% clock speed advantage and a newer manufacturing process (90 nm vs. 130 nm).

Intel's "Core 2" CPUs scrapped the Pentium 4's NetBurst architecture in favor of an evolution of the Pentium M architecture (which was, in turn, an evolution of the Pentium III architecture), and Intel was competitive with AMD on the desktop again. Nehalem brought on-die memory controllers and QPI (Intel's HyperTransport-alike) in late 2008, which made Intel the performance champion on multi-socket servers. AMD's Bulldozer architecture was dead on arrival in 2011, and AMD never recovered from that.

Maybe AMD will pull a rabbit out of their hat with Zen...

creshal · on Sept 4, 2015

> Why? I think their R9 nano cards are quite good.

Comparable performance to the GTX 970 at merely twice the price? Meh.™

low_battery · on Sept 4, 2015

Well, I don't think so. Lets see when reputable sites release their benchmarks next week.

ewindisch · on Sept 3, 2015

The ATI (AMD) graphics are already better than Nvidia for 4k graphics, especially from a price/performance ratio with the 290x2 and 390x cards. The issue is that none of the cards from either ATI nor Nvidia can do 60fps on high quality settings with any of the games that are currently benchmarked. The ATI cards do worse at all resolutions up to 4k, which markably benefits by higher memory bandwidth.

The question for consumers is: why spend >$400 on a card optimized for low-setting play at 4k? Especially when it's a marginal visual improvement over 1440p? Instead, with current-generation cards, players will rather use a lower resolution and higher settings.

cartoonfoxes · on Sept 3, 2015

> The question for consumers is: why spend >$400 on a card optimized for low-setting play at 4k? Especially when it's a marginal visual improvement over 1440p?

Playing as a marksman in Verdun (http://www.verdungame.com) is amazing in 4K.

Symmetry · on Sept 3, 2015

There are tradeoffs to be made between executing more instructions per clock cycle and increasing the clock so measuring performance per clock can be interesting analytically but it doesn't actually say anything about how good a processor is overall. Even for a specific chip lowering it's clock speed will increase it's performance per clock as cache misses become less expensive.

mcbain · on Sept 3, 2015

There's a few problems with that chart, but the one that jumps out is that there is no Intel Core i5-4660. Might be a typo, but makes you question the attention to detail.

alexvoica · on Sept 3, 2015

You're right, it is a typo. It's actually http://ark.intel.com/products/80817/Intel-Core-i5-4460-Proce...

I'll update the chart, thanks for noticing.

rasz_pl · on Sept 3, 2015

Try 8 year old Intel, anything Core2 murders Arm and Amd clock per clock.

Symmetry · on Sept 3, 2015

Both Loongson-3A2000 and 3B2000 are 4-way superscalar processors built on a 9-stage, super-pipelined architecture with in-order execution units, two floating-point units, a memory management unit, and an innovative crossbar interconnect.

Wow, that's pretty wide for an in order processor that isn't VLIW.

EDIT: All the other references to the Godson/Loongson 3 series I can find say that it's out of order in general - at least those articles that didn't repeat that phrase verbatim (I sense a press release). And you can see the reorder queue in the diagram. Unless they're doing something like the Atom did where the ALUs are in order but the AGUs are out of order, but then why have register renaming?

I found some details here[1] making it very clear this is out of order and what happens at each pipeline stage and some (pretty bad) benchmark results here[2].

[1]http://www.7-cpu.com/cpu/Loongson.html#Loongson3A

[2]http://www.7-cpu.com/

hga · on Sept 3, 2015

This article has a useful core architectural drawing: http://www.cnx-software.com/2015/09/03/loongson-introduces-m...

I found it while trying to find out how much cache they've got, looks like 64KiB I and D L1 each, 256KiB L2, and at least the previous generation per Wikipedia had 8 MiB L3. Sounds competitive if their cache management hardware is up to snuff.

rasz_pl · on Sept 3, 2015

As soon as I read 'in-order' I knew it wont be pretty IRL. It might benchmark ok, but real world code is another story. Remember IA64? Itanium was also in-order, relying heavily on compilers.

hga · on Sept 3, 2015

Well, IA64 is not just in-order, but Very Long Instruction Word (VLIW), per Wikipedia a 128 bit instruction had 3 instructions. If your compiler is not smart enough (hmmm, especially if your language is too low level, like C/C++), it sure looks like "just in time" out-of-ordering will beat VLIW in keeping your execution engines running. And surely caching makes a big difference here.

I don't know how to compare a MIPS family in-order superscalar with Intel's P6 style out-of-order. ARM out-of-order cores are supposed to be quite a bit faster than their in-order designs in IRL, aren't they?

sliken · on Sept 3, 2015

Most in order architectures approximately double their performance per clock whe they switch to out-of-order.

Examples include cortex-a57 vs a53. Alpha 21264 vs alpha 21164. As well as many of the atoms until very recently vs core2 and similar chips.

With the various arm derivatives from qualcomm, samsung, and apple a under 2 GHz in order CPU doesn't sound particularly impressive. Especially when they talk about "under 30 watts", that's intel territory, not tablets let alone smart phones.

Trying to emulate arm and beat arm on price/perf or perf/watt seems like a long shot with an in-order mips64. Similarly beating intel at price/perf or perf/watt seems exceedingly unlikely with lower IPC and a MUCH slower clock.

The only thing that makes these mips64 look good is the even slower previous generation mips.

alexvoica · on Sept 3, 2015

I agree on the out-of-/in-order commentary. But to paraphrase, with great performance comes great power consumption. http://www.anandtech.com/show/9330/exynos-7420-deep-dive/5

I perceive the situation on binary translation in a slightly different way. I think the aim here is to perhaps ensure that some legacy code written for x86/ARM also runs alongside the apps written for MIPS. So they are not trying to compete, but ensure they have a broad(er) ecosystem.

hga · on Sept 3, 2015

with great performance comes great power consumption

Heh. And I suppose that's intrinsically true, with the added shadow registers and logic to drive it all?

lallysingh · on Sept 3, 2015

So let's not forget the larger impact here:

- It's a CPU architecture who's primary R&D&Fab is outside the US, and in fact in China.

- As people move to mobile, they could just as easily to Longsoon over ARM.

- Hey, guess where there's a giant mobile market?

Separately, can these run Irix? I miss Irix.

photosinensis · on Sept 3, 2015

They might be able to run Irix. But why would you use an operating system that hasn't been updated in 9 years? It's not going to have sufficient crypto support to be useful.

That said, the smartphone market in China has been dominated by ARM chips. The real question is whether there's enough market demand for a MIPS-based architecture right now, or if the Chinese government can drive demand away from ARM and towards MIPS.

yellowapple · on Sept 3, 2015

> But why would you use an operating system that hasn't been updated in 9 years?

The same reason why people still run Apple ][s and Commodore 64s and old DOS machines: nostalgia.

I mean, if I were to try and build myself a real Jurassic Park workstation, it would be a disservice for it to be running Linux on an x86. No, I want the real deal: IRIX on MIPS, complete with an animated Nedry confronting unauthorized users with "Ah ah ah! You didn't say the magic word! Ah ah ah!".

And yes, I'm aware of jurassicsystems.com, but it just doesn't feel the same. It also doesn't have `fsn`, so I can't zoom around my filesystem in 3-D while muttering that "it's a UNIX system; I know this!".

lallysingh · on Sept 3, 2015

Just to see Irix again.

For driving demand, they could always have tariffs, etc. If they applied to companies manufacturing computers in China, that'd have a massive impact globally.

spitfire · on Sept 3, 2015

Those were my first thoughts too...

Specifically China is pulling a Toyota here. Coming in on the low end where "It doesn't matter".

hga · on Sept 3, 2015

This is "the low end of 64 bit CPUs", though, MIPS is already competitive, along with ARM, in low end 32 bit CPUs, and Intel's still in that space along with I don't know who else.

So what's worth the jump to 64 bits? And note Loongson/MIPS has competition from ARM as well (aren't their 64 bit ARM cores in iPhones now?), and maybe RISC-V, they're working on an out-of-order microarchtecture now (it's an open core, $0 IP to use).

tinco · on Sept 4, 2015

Anyone know if these CPU's require chipsets with locked down firmware? Would be really cool if we could get a modern RMS proof laptop again.

rogerbraun · on Sept 4, 2015

The most modern RMS proof laptop is the Thinkpad X200 with Libreboot. It's probably still faster than this chip.

alexvoica · on Sept 3, 2015

I don't know if people are aware of this but Richard Stallman (rms) uses a Loongson-based laptop from Lemote.

scintill76 · on Sept 3, 2015

He moved to a Thinkpad X60: https://stallman.org/stallman-computing.html

alexvoica · on Sept 3, 2015

Oh, that must have happened recently. Thanks for letting me know!