> it's expensive as hell to do a professional benchmark against accepted standards. Its several days work, very expensive rental of several pieces of $10K hardware, etc
When people casually ask for benchmarks in comments, they’re not looking for in-depth comparisons across all of the alternatives.
They just want to see “Running Model X with quantization Y I get Z tokens per second”.
> That's why I know how to benchmark M2 Ultra and M4 Macs, as they are the second best chip a.t.m. that we need to compete against.
Macs are great for being able to fit models into RAM within a budget and run them locally, but I don’t understand how you’re concluding that a Mac is the “second best option” to your $30K machine unless you’re deliberately excluding all of the systems that hobbyist commonly build under $30K which greatly outperform Mac hardware.
>They just want to see “Running Model X with quantization Y I get Z tokens per second”.
Influencers on Youtube will give them that [1] but its meaningless. If a benchmark is not part of an in-depth comparison than it doesn't mean anything and can't inform you on what hardware will run this software best.
These shallow benchmarks influencers post on youtube and twitter are not just meaningless but also take days to browse through. And they are influencers, they are meant to influence you and are therefore not honest or reliable.
>but I don’t understand how you’re concluding that a Mac is the “second best option” to your $30K machine
I conclude that if you can't afford to develop custom chips than in certain cases a cluster of M4 Mac Mini's will be the fastest cheapest option. Cerebras Wafers or NVDIA GPUs have always been too expensive compared to custom chips or Mac Mini clusters, independent of the specific software workload.
I also meant to say that a cluster of $599 Mac Minis will outperform a $6500 M2 Ultra Mac Studio with 192GB and be half the price for higher performance and DRAM but only if you utilize the M4 Mac Mini aggregated 100 Gbps networking.
When people casually ask for benchmarks in comments, they’re not looking for in-depth comparisons across all of the alternatives.
They just want to see “Running Model X with quantization Y I get Z tokens per second”.
> That's why I know how to benchmark M2 Ultra and M4 Macs, as they are the second best chip a.t.m. that we need to compete against.
Macs are great for being able to fit models into RAM within a budget and run them locally, but I don’t understand how you’re concluding that a Mac is the “second best option” to your $30K machine unless you’re deliberately excluding all of the systems that hobbyist commonly build under $30K which greatly outperform Mac hardware.