I feel as though you are measuring tokens/s wrong, or have a serious bottleneck somewhere. On my i5-10210u (no dedicated graphics, at standard clock speeds), I get ~6 tokens/s on phi4-mini, a 4b model. That means my laptop CPU with a power draw of 15 watts, that was released 6 years ago, is performing better than a 5090.
> The 5090 is 10x faster but only 6-8x the price
I don't buy into this argument. A B580 can be bought at MSRP for 250$. A RTX 5090 from my local Microcenter is around 3250$. That puts it at around 1/13th the price.
Power costs can also be a significant factor if you choose to self-host, and I wouldn't want to risk system integrity for 3x the power draw, 13x the price, a melting connector, and Nvidia's terrible driver support.
EDIT: You can get an RTX 5090 for around 2500$. I doubt it will ever reach MSRP though.
> The 5090 is 10x faster but only 6-8x the price
I don't buy into this argument. A B580 can be bought at MSRP for 250$. A RTX 5090 from my local Microcenter is around 3250$. That puts it at around 1/13th the price.
Power costs can also be a significant factor if you choose to self-host, and I wouldn't want to risk system integrity for 3x the power draw, 13x the price, a melting connector, and Nvidia's terrible driver support.
EDIT: You can get an RTX 5090 for around 2500$. I doubt it will ever reach MSRP though.