Hacker News new | past | comments | ask | show | jobs | submit login
A Dive into Ray Tracing Performance on the Apple M1 (willusher.io)
105 points by tempodox on May 31, 2021 | hide | past | favorite | 55 comments



For anyone looking to find real-time raytracing examples, this is not it. The author seems to be investigating static renders like you would do in Blender, offloading it to the CPU instead of the GPU. Unsurprisingly, the M1 does not support hardware-accelerated ray tracing. However, almost any CPU made in the last 10 years is capable of software-accelerated ray tracing with software like Pixar's Renderman and the open-source Cycles engine.


Octane, ProRender, and soon I believe Redshift on macOS use the Metal API for GPU based ray tracing. They are generally faster than CPU based ray tracing (well maybe not ProRender, it’s basically abandoned by Maxxon in favor of Redshift). So while the M1 GPU and most AMD GPUs supported by macOS don’t have hardware specific for accelerating ray tracing, they can still be used to accelerate it over using the CPU.

Edit: The article does compare the ray tracing capabilities of the M1 GPU to a RTX 2070. The RTX 2070 obviously crushes it, as the author points out the obviously unfair comparison. The RTX card uses way more power, is much largerx and has hardware acceleration for ray tracing.

I am interested in seeing if Apple adds hardware acceleration for RT to future chips (or if they will support external AMD GPUs).


> The RTX card uses way more power, is much largerx and has hardware acceleration for ray tracing.

Isn't RTX card using less power per ray according to this benchmark? This also applies to ray/size unit I think.


> However, almost any CPU made in the last 10 years is capable of software-accelerated ray tracing with software like Pixar's Renderman and the open-source Cycles engine.

People have been rendering on any CPU they could get their hands on for the last forty years and there is no such thing as "software accelerated ray tracing". That would be like someone saying their car does engine accelerated driving.


I propose “naturally forced induction”


> software-accelerated ray tracing

Is this a particular term/concept, or do you just mean non-hardware-accelerated ray tracing?

If the former, what does it mean for something to be "software-accelerated?"

If the latter, I apologize if this comes off as nit-picking.


When I say software-accelerated, I'm referring to anything that doesn't offload any rendering processes to the GPU. It's not necessarily a turn of phrase, but I think most people familiar with the topic will grok that software rendering = no GPU offload.

Though I do understand the desire to be pedantic, because ISAs like this and their SIMD instructions can kinda blur the lines between what constitutes as "hardware accelerated". That's why I try to avoid it as much as possible for the sake of simplicity.


I think parent was wondering what the word "accelated" in "software-accelerated" could mean. Faster than what? (faster doing the calculations by hand? :)


The concept of "accelerating" raytracing operations on a CPU makes almost no sense.

GPUs require acceleration because the memory latency on GPUs is pretty large, caches are small, and GPUs have such huge compute that waiting around for the "next = bvh-tree->left" operation takes a huge amount of time.

By making dedicated acceleration structures and hardware, that latency can be mitigated, and the huge compute capabilities of GPUs can be unleashed.

----------

In contrast, CPUs are already really good (relatively) at pointer-dereferencing loops. CPUs have less compute and more memory-optimizations (relative to a GPU anyway).


I have seen in old posts that people writing their own raytracers, and that normally means it's on CPU as well right?

so their own raytracers are not real-time either


I think it is neat demonstration how weird the CPU performance has been in recent past that on one hand M1 has very impressive performance, but on the other hand 7 year old CPU puts out competitive numbers. It is also demonstration how much performance we have left behind as we have adopted laptops as the mainstream computing platform/


> It is also demonstration how much performance we have left behind as we have adopted laptops as the mainstream computing platform/

I don't know. Desktops simply tend to have a much longer lifespan and so far every new laptop I owned in the past 10 years outperformed my then desktop PC for that precise reason (at the very least in certain aspects like performance per watt).

What I'm rather afraid of is other manufacturers adopting Apple's insane vendor lock-in and total hardware lock-down. Any failure in an M1 system results in complete data loss and a logic board replacement - RAM on SoC, SSD soldered to the board, etc. No more choice and zero upgradability - the oldest components in my current desktop are 12+ years old, but the rest was gradually upgraded (SSD, GPU, CPU, additional RAM, HDD replacements, etc.)

Such things won't be possible in brave new world of SoCs...


When it comes to PC Desktops, I do feel there might be more money to be made in still allowing modularity that allows PC enthusiasts and builders to still build their own computers. There is a massive consumer market for it.

People aren't going to be buying new desktops every 1-2 years, but many of them will buy new GPU's, or cases, or fans etc. The best part is that we get to upgrade the components that will actually provide better performance. For instance, why would I upgrade my entire machine just for a better graphics chip, if my SSD, CPU and RAM are still in tip top shape, short answer is I wont.


In reality those things also change depending on your luck and how 'in-sync' you are with the industry. Just upgraded your ddr3 ram in 2013? Get ready for ddr4 which will require both new ram and a new motherboard - and any new high-performance cpu will only be compatible with ddr4 so you're forced to upgrade lest you be stuck on an old standard. This is about to happen with ddr5 as well, by the way.


Absolutely there are certain upgrades which require large component upgrades but those are few and far between. But this doesn't require a new GPU, case, case fans, storage media etc.

DDR4 as an example was released 7 years ago and DDR5 is still yet to be released not less a required upgrade for a new CPU.


Depends, I always used my desktops for so long that when it to replace anything, I just had to buy a new one anyway due to cascade effect of updates.

There were a couple of exceptions, I just got lucky that old stuff was still around to buy.


I have had the same experience. I built a desktop telling myself I'll be able to upgrade parts when I'm no longer satisfied with the performance. Turned out that by then the sockets had changed sufficiently that it didn't make much sense.


It's a scale and the original commenter was talking about Apple's approach which is that entire board would need to be replaced. When it comes to PC desktops, you can still upgrade the CPU, RAM and Mobo without needing to upgrade the GPU (Costly), power supply and storage devices.

PC's have always been this way and sometimes you get lucky that you can use the same slot for a long time, other times you dont.

For me, if I wanted the latest CPU I would need to upgrade my mobo and CPU. But not my RAM, my GPU ($1500), power supply or my SSD's. If I was on a Mac and one of these parts broke, I would need to actually buy all of the internals again which are soldered to the main board.


That is why around 2006 I went 100% into workstation class laptops + docking station, and don't miss desktops at all.


Right and that's but but clearly my post was in response to the user who commented that with increased integration/soldering of parts if one part fails you need to throw the entire thing out rather upgrading that specific component.

In your case, if nothing fails in your machine both options don't impact you at all. Whereas if a part does fail you might be up for a more expensive repair bill than previous


It comes to mind to ask why Apple didn't seem to do any binning of their cpus, e.g. use a cpu capable of higher turbo frequency in the pro.


I think they did binning with 7 vs 8 core GPU, but my guess is the more intricate the binning the more complex the whole assembly logistics become, so they probably wanted to avoid that for the initial launch.


For CineBench R23 Score. ( In case anyone will read it the wrong way )

Intel® Core™ i7-1165G7 by default is an 15W TDP CPU, especially true if you are spreading load across all cores. ( 28W if you are on single Core TurboBoost ) It is on 10nm SuperFin ( equivalent to ~TSMC 7nm ), Quad Core and 8 Thread, Not sure about its AVX 512 clock speed limitation I dont have time to dig that up.

The Apple M1 is ~ 24W TDP, 8 Core 8 Thread, with 4 being High Efficiency Core on TSMC 5nm.

You are looking at a ~60% higher TDP number. The results are still impressive, but needs these additional context before it is consumed.

I will also not be surprised if A15 has a major GPU uArch change. ( Something similar to IMG B-Series )


M1 is barely 20-25w at the wall. Actual chip peak TDP is more like 14-18w. (EDIT: this is measured from a Mac mini by Anandtech -- a laptop likely has slightly lower TDP due to better binning)

Intel chips have much higher peak and sustained actual power usage than their TDP indicates. At the wall, those “15w” systems are actually much closer to 40-60w on sustained loads.


> At the wall, those “15w” systems are actually much closer to 40-60w on sustained loads.

They dont. PL2 only last at best 30 secs. Not Sustained Loads. Laptop depending on heatsink config sustain at either 15W at base clock or 28W at slightly higher clock.

The M1 Mac mini under full load is 39W as an official Apple Published Figure [1].

Not sure why people are downvoting my original post. But I guess that is the state we are in for hardware topic on HN.

[1] https://support.apple.com/en-us/HT201897


I believe benchmarked numbers more than official statements.

https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...

For comparison, Panther Lake 1165G7 NUC uses as much as 85w (70w sustained) when maxing out its load. The author even notes how they'd beefed up the cooling.

https://nucblog.net/2021/02/panther-canyon-i7-nuc-review-nuc...


Correct me if I'm wrong, but isn't this conflating CPU TDP and overall system consumption? When a system is under maximum sustained load, power consumption increases are not limited to the CPU or SoC. I suspect the figures published by Apple (6.8W and 39W) represent the absolute theoretical extremes possible by the system, not typical numbers for real-world idle and load.


It’s very hard to estimate the true power consumption of laptops since they are all primarily fed from the battery not the charger. There are almost no passthroughs anymore even in “gaming” laptops. Only a handful of DTR laptops still employ a passthrough with a high wattage PSU or dual PSU set up and I honestly haven’t seen those for the last 2-3 gens...

The last one I’ve seen that had it was a 8700K with dual 1080’s DRT monstrosity.


> It’s very hard to estimate the true power consumption of laptops since they are all primarily fed from the battery not the charger.

That's news to me. I try to keep up to date with laptops, and all the reviewers have pretty accurate power consumption measurements. Even just using the internal reporting reveals quite a few details [1].

Even a "12W" TDP 11th gen Intel CPU can draw up to 50W in turbo mode [2].

The 11th gen "45W" laptop CPUs have a PL2 power draw of 135W [3], which, thermals and VRMs permitting, can be held indefinitely according to the specs (which is what makes Intel's whole TDP-rating useless indeed).

So yes, it's very hard indeed to estimate newer laptop's power consumption, but for entirely different reasons. It's primarily the cooling and OEM's choice of how to implement the very loosely defined (Intel!-) specs that define power draw. If you have excellent cooling, there's nothing to stop an 11800H from drawing 135W for minutes at a time... (which can be measured, both internally and from the wall no problem).

AMD CPUs on the other hand mostly adhere to the published power rating (give or take 20% - again, cooling permitting and configurable by the OEM).

[1] random example: https://bit.ly/34xLKpM

[2] https://www.anandtech.com/show/16084/intel-tiger-lake-review...

[3] https://www.itworldcanada.com/article/eight-core-intel-tiger...


I've never seen a laptop with a removable battery where you couldn't remove the battery and power it directly.


Can someone tell me how in CUDA you can use RTX cores? I know the main problem with GPU programming is that they hate conditionals - and I was thinking RTX would be vastly different in that regard.


I didn't realize Intel had shipped a mobile chip with AVX-512.

Also, Intel for the love of God sort out your product names, use a proper system rather than /dev/random Lake


> Also, Intel for the love of God sort out your product names, use a proper system rather than /dev/random Lake

Suffixes are the main thing on Intel's systems.

U -- Ultrabook class 15W.

M -- Previously the standard laptop class 25W.

H / HQ / etc. etc. -- High-performance laptop: 45W. Q for quad-core but that doesn't seem to be a big deal anymore.

T -- Power-optimized Desktop

No suffix -- Standard Desktop

K -- Unlocked Desktop.

-----

The i5 / i7 is a marketing name, and more about MSRP than anything else. You can mostly ignore it. i7 and i9 are more expensive, while i3 and i5 are cheaper.

The last 3 digits are highly-specific codes within a family. The other digits are the generation number: 8 for 8th generation, 11 for 11th generation. For example, 2700K means 2nd generation Unlocked Desktop processor.

----

I personally start with the suffix: if you know someone's looking for a battery life longer than 4-hours, I go with the U-class, then I pick out an MSRP for them (usually i7, but i5 or i3 if they care more about budget).

If the battery life under 2-hours is fine, I push for H-class for the higher performance (though these are usually on 17in laptops).

I stay up to date and make sure that I buy a current-generation, or at worst a "last generation" product if I know that who I'm talking to cares about price. That means that I will recommend a 11th generation i7 to most people, but if people care about price, maybe a 10th generation i7 or a 11th generation i5.


It's not even worth trying to remember because they keep adding new features and tiers. The Y-series are the real low-power laptop parts. They just added a never-before-seen KB designator to the newest desktop-class 11th Gen parts. Then with the Xeons there are R, W, and M suffixes, where M doesn't mean the same thing as the Core M.


Y has been around for years, and so has K, H / HQ, M, T, U designations. The newest designations (F for GPU-free) haven't really changed the meaning of the old ones.

Xeon has its own naming convention that was changed maybe 5ish years ago? (Whenever they went from E3 / E5 / E7 designation to Xeon Bronze / Silver / Gold / Platinum). There are also naming schemes for the Atom-line, but these aren't commonly bought by consumers.


Another one with some relevance right now in an era where GPUs are hard to get is "F" which signifies that it doesn't have an integrated GPU.


They have an integrated GPU, but it's just (most likely broken on the die and) turned off in the final product. There's no dedicated process to create these abominations. Same with most 10/8 core chips for example: If the process for the 10 core die results in one or two faulty cores they just turn them off and sell it as 8 core. This is basically true for most mainstream CPU and GPU vendors/products.

This also results in situations like for example selling working 10 cores as 8 cores (with cores turned off) to fulfill market needs in case of shortages ... shortages of "faulty" dies. In some cases methods are available to "upgrade" you chip again. tl;dr how to abuse chip binning (or how to void your warranty)


Yeah, I'm aware of the way they bin chips and segment the market, I was talking about "not having" the capability.

As I understand it, the ability to re-enable parts isn't really a thing any more. It was true historically that you used to be able to re-enable cores, but most modern chips have the components fused off at a hardware level.


The code names are not the problem. Basically all companies use engineering code names so that engineers can talk about unreleased products without accidentally revealing things. And since you should not be able to divine anything about the products from them, picking random geographical features is just fine.

The problem is that their actual product naming is such a massive mess that people prefer to learn the code names instead of trying to figure out what, exactly, a i7-1165G7 is and how it relates to, say, a i5-11400H?


The problem is Intels incredible tendency to cut, slice, and dice their market into tiny slivers of segments. For example Comet Lake has 80ish SKUs listed on its wiki page: https://en.wikipedia.org/wiki/Comet_Lake_(microprocessor)

Combine this with the fact that they have completely different processor families for different segments concurrent with these, there are going to be hundreds of SKUs in total on the market at any point.

I don't know if there is any sort of naming system that can salvage that to make any intuitive sense. The actual product names mostly serve to function as keys to search ARK.


Is that Intel marketing 80 SKUs of their own volition; or is that 80 large customers each ordering a million-piece custom configuration, and then Intel figuring that, if they're going to make a part in quantity, they may as well let other people buy it if they want?


A lot of it is presumably binning [0]. They make a bunch of CPUs and check each to find the number of defective cores and the max clock speed it can run at before malfunctioning, then slap on the appropriate product number for that combination.

[0]: https://www.techspot.com/article/2039-chip-binning/


Compare/contrast to others e.g. Qualcomm. If Qualcomm can do this without fracturing their product line into incomprehensible gibberish then why can’t Intel?


Honestly 80 models that differ by clock speed, core count, or target form factor bothers me way less than trying to figure out the difference between Comet Lake, Cascade Lake, Ice Like, Rocket Lake, etc.


> The problem is that their actual product naming is such a massive mess that people prefer to learn the code names instead of trying to figure out what

They actually have a guide explaining all the bits. It doesn't really matter since most of it is arbitrary and marketing (including µarch rebadging same as the GPU vendors), so even within a generation it tends to tell you very little in and of itself.


That too, but if the microarchitectures had a numeric identity for example (even in parallel to the xyz lake name) the route to a better numerical ID for CPU's would be easier.


That's originally what the generation was.

Then it became inconvenient so generations and µarchs got disconnected (because marketing, and then more marketing as they had to introduce µarch refreshes because they couldn't move through their plans)


I had no idea they'd gone back to 4 numbers for some of their CPUs. What does G7 even mean?



Yes but that's the normal pattern: the historical naming was XYYY where X was the generation and YYY was the SKU. With the 10th generation, this became XXYYY, continuing the pattern.


Graphics Level 7. Basically a relative indication of the iGPU capabilities.


They've been shipping AVX-512 in client SKUs since the very first Ice Lake machines, actually (my Dell XPS 9300 Ice Lake features it.) It's at the point where all forward client SKUs will at least come with 512-F


Side note. Apple is great at waiting on tech until it's ready, and then making sure everyone knows about it (through a combo of a product launch and marketing).

Retina displays, multitouch on touchscreens, and now ARM processors. The notable thing is that Apple makes the enabling incremental change (to a point where the product is good enough for mass market) seem like a step function.


The mass adoption that Apple's embrace brings is a step function for things that hit economies of scale thresholds (iPad lidar) and anything with a network effect (airtag).


Whenever I see articles related to the M1, the performance is amazing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: