I strongly second the recommendation! Apple's CPU optimization guide is great and not just for their own CPUs but for anyone interested in ARM64(ARMv8, Aarch64, however you call it) in general. It's one of the best written manuals I've read on this topic (which is few but still), with great visualizations and should be accessible to a person even with little low-level knowledge.
Should you want to play with SIMD but are a little intimidated - Swift and C# and offer convenient "platform-agnostic" SIMD abstractions, and C# also has NEON/AdvSimd intrinsics in the form of "plain" API calls e.g. `AdvSimd.AddPairwiseWidening` for more direct control (I'm biased on this subject as, while I like Swift, using Xcode and surrounding tooling is sad and less convenient, and the support for Linux/Windows is not there yet).
Unfortunately no, not after I read it at least. But I wish it was there a year earlier or so - would prefer it to reading ARM's SIMD&FP documentation. It mostly helped me better understand ARM's strided simd loads and stores (scatter/gather) and shuffles, to verify previous data from https://dougallj.github.io/applecpu/firestorm.html, improve overall mental model and was just pleasant to browse through with all the visualizations.
(the original comment does not mention but, to be specific, this is about this document: https://developer.apple.com/download/apple-silicon-cpu-optim...)
Should you want to play with SIMD but are a little intimidated - Swift and C# and offer convenient "platform-agnostic" SIMD abstractions, and C# also has NEON/AdvSimd intrinsics in the form of "plain" API calls e.g. `AdvSimd.AddPairwiseWidening` for more direct control (I'm biased on this subject as, while I like Swift, using Xcode and surrounding tooling is sad and less convenient, and the support for Linux/Windows is not there yet).