HN is overloaded with AI stuff, its hard to break through all the noise. I say this as someone very interested in AI. Even I skip some links because its just too much.
I see it making claims about 10x efficiency, but how is tokens / second / watt? The only machines I've seen with the memory bandwidth to effectively do local inference are Mx arm chips on mac.
because it's not faster than the Ryzen 395's GPU. power efficiency doesn't matter as much as TTFT for desktop users, especially when they're tasking their AMD box as a dedicated inference machine.
some older pre-395 AMD articles suggested it'd be possible to use the NPU for prefill and the GPU for decoding and this would be faster than using either alone, but we have yet to see that (even on Windows) for any usefully sized models, just toys like LLaMA-8B.
On average according to Geekbench, the M5 compared to the 9950X is ~17% faster in single thread performance and ~30% slower in multithread performance.
Individual benchmarks tell the bigger picture. These two are optimized for different use cases, with Apple heavily leaning towards low latency single thread throughput with low sustained power usage.
I think the point of the line of questioning is to illustrate that "tools" like a code interpreter act as scratch space for models to do work in, because the reasoning/thinking process has limitations much like our own.
I'm aware. I work in ewaste recycling, and most of the machines I come across are about 10 years old. I'm also a fan of JayzTwoCents. https://m.youtube.com/watch?v=ukb5tlT4IuQ
For anyone who doesn't know yet, there are a wide variety of ONVIF supporting cameras that you can setup with a local NVR running Frigate. You can block internet access to the cameras, so they can't create outbound connections, and only inbound connections to the video streams are allowed.
Tailscale has a free tier that's a good option to remotely access your network and cameras.
As a former user of Zulip at a previous company, thank you for this software, I enjoyed using it. Maybe I'll setup a private instance for friends and family so I can enjoy it once again.
There's a difference between carrying ten pounds small distances for short durations, and carrying an extra two pounds over twenty hours of travel, across multiple connecting international flights in a single day. It's also not just an extra two pounds, it's an additional proprietary power cord, bulk, more mass moving in and out under an airliner seat, it all adds up. Especially when you're sleep deprived and physically exhausted.
Any amount of weight is annoying after that long, but if the extra laptop weight is reduced to 10% of your 25 pound bag then it's even less able to be the deciding factor between "portable" and "barely portable".
Is the CUDA compat layer AMD has that transparently compiled existing CUDA just fine insufficient somehow or buggy somehow? Or are you just stuck in the mindshare game and haven’t reevaluate whether the AMD situation has changed this year?
I haven't checkout out AMD's transparency layer and know nothing about it. I tried to get vkFFT working in addition to cuFFT for a specific computation, but can't get it working right; crickets on the GH issue I posted.
I use Vulkan for graphics, but Vulkan compute is a mess.
I'm not in a mindshare, and this isn't a political thing. I am just trying to get the job done, and have observed that no alternative has stepped up to nvidia's CUDA from a usability perspective.
> have observed that no alternative has stepped up to nvidia's CUDA from a usability perspective.
I’m saying this is a mindshare thing if you haven’t evaluated ROCm / HIP. HIPify can convert CUDA source to HIP automatically and HIP is very similar syntax to CUDA.
I posted it here the same day I found and started using it, to almost no reaction.
[0] https://github.com/FastFlowLM https://fastflowlm.com/ https://huggingface.co/FastFlowLM