MathSharp, a vector and matrix library written in C# using hardware intrinsics

bob1029 · on Sept 12, 2019

Only 11 days to go until the official 3.0 release of .Net Core. We're pretty excited about all the new features that were added (such as the hardware intrinsics here). The biggest one we want to play around with is actually the new JsonSerializer. UTF8 vs UTF16 as the underlying storage type could have a huge impact on performance for some of our larger JSON contracts.

"Parsing a typical JSON payload and accessing all its members using the JsonDocument is 2-3x faster than Json.NET with little allocations for data that is reasonably sized (that is, < 1 MB)."

From: https://docs.microsoft.com/en-us/dotnet/core/whats-new/dotne...

I'll definitely be checking this math(s) library out after I get our solutions updated from 2.2. I was playing around with some ray tracing in C#-land and something like this could come in really handy.

djmips · on Sept 12, 2019

I took a look at the code and I was wondering about the implementation performance where there are a lot of if() statements in the low level functions. Like Vector4F DotProduct2D() has if (Sse41.IsSupported) else if (Sse3.IsSupported) else if (Sse.IsSupported) and finally a default implementation. This seems inefficient but I don't know if somehow the booleans are more like compile time flags and the extraneous code disappears in a puff of logic.

jdmichal · on Sept 12, 2019

Those might fall out during JIT compilation as constant expressions. Would be worth running it through ngen to see what happens. (I'd do it myself but I'm not at a PC at the moment.)

EDIT: Just realized the Intrinsics namespace was added in Core 3.0 preview. I don't have the setup to test this myself right now. I did find an online compiler, but it looked like it optimized away everything in release mode.

Faark · on Sept 12, 2019

There was some discussion here on hn a few days ago [0] that linked to a blog post about those intrinsics [1]. Those should indeed be optimized away. The relevant section:

> The IsSupported checks are treated as runtime constants by the JIT (when optimizations are enabled) and so you do not need to cross-compile to support multiple different ISAs, platforms, or architectures. Instead, you just write your code using if-statements and the unused code paths (any code path which is not executed, due to the condition for the branch being false or an earlier branch being taken instead) are dropped from the generated code (the native assembly code generated by the JIT at runtime).

[0] https://news.ycombinator.com/item?id=20915430 [1] https://devblogs.microsoft.com/dotnet/hardware-intrinsics-in...

legulere · on Sept 13, 2019

The JIT optimizes them away:

> The IsSupported checks are treated as runtime constants by the JIT (when optimizations are enabled) and so you do not need to cross-compile to support multiple different ISAs, platforms, or architectures. Instead, you just write your code using if-statements and the unused code paths (any code path which is not executed, due to the condition for the branch being false or an earlier branch being taken instead) are dropped from the generated code (the native assembly code generated by the JIT at runtime).

https://devblogs.microsoft.com/dotnet/hardware-intrinsics-in...

gameswithgo · on Sept 12, 2019

branch predictor gonna work well on those at least, if the jit doesn’t make then go away entirwlyy

nick_ · on Sept 13, 2019

came to say this. +1

johnkellyoxford · on Sept 16, 2019

yep, the JIT fully removes those checks when native code gen occurs :)

jpz · on Sept 13, 2019

The following call which is in MatrixOperations for instance is about cameras - I assume this is some scene-object type of logic regarding 3d graphics?

That does not look like a general-purpose linear algebra library to me.

public static MatrixSingle CreateBillboard(in Vector4FParam1_3 objectPosition, in Vector4FParam1_3 cameraPosition, in Vector4FParam1_3 cameraUpVector, in Vector4FParam1_3 cameraForwardVector)

BubRoss · on Sept 12, 2019

Looks like this claims to be faster than other libraries but has no benchmarks.

ldng · on Sept 12, 2019

Yeah. Fast compare to what ? Blas ? On which platform ? The Readme is a bit short for something that pretend to be "significantly faster than most mathematics libraries out there"

johnkellyoxford · on Sept 16, 2019

hi - lead dev of the library. there are a multitude of benchmarks in `MathSharp/samples/MathSharp.Interactive/Benchmarks`, I just haven't been able to put them into the seperate project because of personal time constraints so far

selimthegrim · on Sept 20, 2019

Any update on this inclusion? Looks like two or three were added

oomkiller · on Sept 12, 2019

Probably not what you are looking for, but maybe useful: https://www.reddit.com/r/dotnet/comments/d2t7ti/mathsharp_a_...

theclaw · on Sept 12, 2019

There is already a Microsoft library for this - System.Numerics.Vectors, see documentation at https://docs.microsoft.com/en-us/dotnet/api/system.numerics?...

Not sure how MathSharp compares in performance or usability, however.

manigandham · on Sept 13, 2019

This is built on top of the newer hardware intrinsics capabilities added to the framework underneath a new namespace: System.Runtime.Intrinsics

https://devblogs.microsoft.com/dotnet/hardware-intrinsics-in...

gameswithgo · on Sept 12, 2019

that uses the old and very limited simd facilities of .net

whycombagator · on Sept 12, 2019

> How to use

> TODO

:(

johnkellyoxford · on Sept 17, 2019

Sorry about that!! Will get on it ASAP when I have some spare time

bhouston · on Sept 13, 2019

I suggest that you use run-time code generation to create optimized classes that do not check for intrinsic features on each function call.

victorNicollet · on Sept 13, 2019

This is exactly what is happening here. The JIT generates code at runtime based on the target machine's capability.