I'd recommend giving openconnect a try. It was originally meant for Cisco VPNs, but has also supported GlobalProtect for a while now. It integrates with the NetworkManager GUI if you use that.
It worked flawlessly with the GlobalProtect VPN we had to use at my last job. A few folks ended up switching to using openconnect on Mac too. The official client seems to be quite bad on both platforms.
Perhaps IT at $DAYJOB has pinned to one of the few good versions of GlobalProtect™, but the only trouble I've had with it on my work-issued Mac (and this is a recent trouble) was it refusing to pop up the dialog to enter two-factor creds.
Granted, this is a right pain in the ass, because you have to restart the fucking GlobalProtect™ service to flush out the bug... but otherwise, it has been fine for years and years.
There’s something very weird about macOS multitasking in general, onedrive and other apps that you’d expect to stay alive always in the background constantly need to be refreshed.
Blame marketing, branding and pressure for value-add. There's no reason for anyone to have a VPN client you can see. It's all IPSec, OpenVPN, or wireguard under the covers. It should be handed off to the OS, but every corp needs to be a special little snowflake and show you their logo. (Yeah, I'm bitter, I fought with way too many crappy GUI wrappers for openvpn CLI) Once the app exists, there seems to be little pressure to make it actually good - one of them updated and literally removed the necessary client but just from the MacOS releases.
(ZeroTier gets a pass though because it's an uncommon protocol and I can't remember the last time I've actually seen the app open - as it should be.)
If they are cache friendly then they can be very fast. The papers by Phil Bagwell (may he rest in peace) are very rich. Lots of implementation detail. Check them out!
Numpy/Cupy do explicit coercing to float64. There is no documentation for why this is done, but since GEMM is used to compute the covariates (summation over data points), it makes sense to increase the precision.
You could get away using a double to only accumulate the sum, but it's a pain to write such mixed-precision (slow) function in C and then wrap it in Python, esp. for something like this.