Well thinking about it a bit more, it would be so cool if you could
A) somehow continuously interact with the running model, ambient-computing style. Say have the thing observe you as you work, letting it store memories.
B) allowing it to process those memories when it chooses to/whenever it's not getting any external input/when it is "sleeping" and
C) (this is probably very difficult) have it change it's own weights somehow due to whatever it does in A+B.
THAT, in a privacy friendly self-hosted package, i'd pay serious money for
I imagine it could solve crimes if it watched millions of hours of security footage…scary thought. Possibly it could arrest us before we even commit a crime through prediction like that black mirror episode.
Oh, you're thinking of "Hated in the Nation"? More relevant would possibly be "Minority Report" (set in 2054) and Hulu's "Class of '09", in which the FBI starts deploying a crime prediction AI in their version of 2025.
Quite scary. As the meme has it, it seems that we're getting ready to create the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus.
Considering the power of three letter agencies in the USA and the complete unhingedness of the new administration, I would not trust anything to a contract.
Sure I am certain there is a possibility but unless you have airgapped your local instance and locked down your local network securely it does not really matter.
It’s cool to run things locally and it will get better as time goes on but for most use cases I don’t find it worth it. Everyone is different and folks that enjoy the idea of local network secure can run it locally.
Even a badly operated on-prem system has the advantage that if someone breaks in, they are taking a risk of getting caught. Whereas with Azure the TLAs could just hoover up everything there without high risk of customers finding out (assuming they can gag MS). Given the reporting about NSA's "collect everything" modus operandi this doesn't seem very far fetched.
They are eluding to it not being secure against state actors. The distrust in government isn’t novel to this discussion so it should come as no surprise on HN. There is also a general fear of censorship which should be held more toward the base model owners and not toward cloud providers. I still think doing this in the cloud makes more sense initially but I see the appeal for a home model that is decoupled from the wider ecosystem.
You could absolutely install 2kw of solar for probably around 2-4k and then at worst it turns your daytime usage into 0$. I also would be surprised if this was pulling 1kw in reality, I would want to see an actual measurement of what it is realistically pulling at the wall.
It could have zero marginal cost, right? In particular, if you over-provisioned your solar installation already anyway, most of the time it should be producing more energy than you need.
No need for battery and battery is by far you largest cost. This could 100% just fallback to grid power, it's not backup power it's reducing usage.
Not sure about where you are but where I am a 2kW plus li-ion batteries is about 2months of the average salary here, not for tech, average salary, to put it into perspective converted to USD that is 1550 usd. Panels is maybe 20% of that cost, you can add 4kW of panels for 450 USD where I am.
So for less than the price of that PC I would be able to do 2kW of solar with li-ion batteries and overspecing panels by double. None of that cheaping out on components, can absolutely get lower than that if cheaping out. Installation will be maybe another 500-600 USD here, likely to be much higher depending on region. Also to put it into perspective we pay about 0.3 USD cents per kWh for electricity and this would pay for itself in between a year and two in savings.
By the time it needs to be replaced which is from 5-7 years on the stuff I just got pricing on it would have 100% offset the cost of running.
Again I am lucky and we effectively get 80-100% output year round even with cloud cover, you might be pretty far north and that doesn't apply.
TLDR: it depends but if you are in the right region and this setup generates even some income for you the cost to go solar is negative, it would actually not make financial sense to not do it, concidering a 2K USD box was in your budget.
Privacy, for me, is a necessary feature for something like this.
And I think your math is off, $0.20 per kWh at 1 kW is is $145 a month. I pay $0.06 per kWh. I've got what, 7 or 8 computers running right now and my electric bill for that and everything else is around $100 a month, at least until I start using AC. I don't think the power usage of something like this would be significant enough for me to even shut it off when I wasn't using it.
Anyway, we'll find out, just ordered the motherboard.
That is like, insanely cheap. In Europe I'd expect prices between $0.15 - 0.25 per kWh. $0.06 sounds like you live next to some solar farm or large hydro installation? Is that a total price, with transfer?
So one thing, it's the winter rate, summer rate is higher when people run their AC and the energy company has to use higher cost sources. Second, it's a tiered rate, first x amount of kWh is a higher rate, then once you reach that amount it's a lower rate. But I'm already above the tier cutoff every month no matter what, so marginal winter rate is around $0.06.
This gets you the (arguably) most powerful AI in the world running completely privately, under your control, in around $2000. There are many use cases for when you wouldn't want to send your prompts and data to a 3rd party. A lot of businesses have a data export policy where you are just not allowed to use company data anywhere but internal services. This is actually insanely useful.
How is it that cloud LLMs can be so much cheaper? Especially given that local compute, RAM, and storage are often orders of magnitude cheaper than cloud.
Is it possible that this is an AI bubble subsidy where we are actually getting it below cost?
Of course for conventional compute cloud markup is ludicrous, so maybe this is just cloud economy of scale with a much smaller markup.
1. Economies of scale. Cloud providers are using clusters in the tens of thousands of GPUs. I think they are able to run inference much more efficiently than you would be able to in a single cluster just built for your needs.
2. As you mentioned, they are selling at a loss. OpenAI is hugely unprofitable, and they reportedly lose money on every query.
I think batch processing of many requests is cheaper. As each layer of the model is loaded into cache, you can put through many prompts. Running it locally you don't have that benefit.
They are specifically referring to usage of APIs where you just pay by the token, not by compute. In this case, you aren’t paying for capacity at all, just usage.
"Sharing between users" doesn't make it cheaper. It makes it more expensive due to the inherent inefficiencies of switching user contexts. (Unless your sales people are doing some underhanded market segmentation trickery, of course.)
No, batched inference can work very well. Depending on architecture, you can get 100x or even more tokens out of the system if you feed it multiple requests in parallel.
Of course that doesn't map well to an individual chatting with a chat bot. It does map well to something like "hey, laptop, summarize these 10,000 documents."
Yes, and people do that. Some people get thousands of tokens per second that way, with affordable setups (eg 4x 3090). I was addressing GP who said there is no economies of scale to having multiple users.
All that, but also because they have those GPUs with crazy amounts of RAM and crazy bandwidth?
So the TPS is that much higher, but in terms of power, I guess those boards run in the same ballpark of power used by consumer GPUs?
Most people value privacy, but they’re practical about it.
The odds of a cloud server leaking my information is non-zero, but it’s very small. A government entity could theoretically get to it, but they would be bored to tears because I have nothing of interest to them. So practically speaking, the threat surface of cloud hosting is an acceptable tradeoff for the speed and ease of use.
Running things at home is fun, but the hosted solutions are so much faster when you actually want to get work done. If you’re doing some secret sensitive work or have contract obligations then I could understand running it locally. For most people, trying to secure your LLM interactions from the government isn’t a priority because the government isn’t even interested.
Legally, the government could come and take your home server too. People like to have fantasies about destroying the server during a raid or encrypting things, but practically speaking they’ll get to it or lock you up if they want it.
What about privacy from enriching other entities through contributions to their models, with thoughts concieved from your own mind? A non-standard way of thinking about privacy, sure. But I look forward to the ability to improve an offline model of my own with my own thoughts and intellect—rather than giving it away to OpenAI/Microsoft/Google/Apple/DeepSeek/whoever.
Privacy is a relatively new concept, and the idea that individuals are entitled to complete privacy is a very new and radical concept.
I am as pro-privacy as they come, but let’s not pretend that government and corporate surveillance is some wild new thing that just appeared. Read Horace’s Satires for insight into how non-private private correspondence often was in Ancient Rome.
It's a bit of both. Village societies don't have a lot of privacy. But they also don't make it possible for powerful individuals to datamine personal information of millions.
Most of us have more privacy than 200 years ago in some ways, and much less privacy in other ways.
I think the privacy should be the whole point. There's always a price to pay.
I'm optimistic that soon you'll be able to get better speeds with less hardware.
I think you may be underestimating future enshitification? (e.g. it's going to be trivially easy for the cloud suppliers to cram ads into all the chat responses at will).
This rig does >4 tok/s, which is ~15-20 ktok/hr, or $0.04/hr when purchased through a provider.
You're probably spending $0.20/hr on power (1 kW) alone.
Cool achievement, but to me it doesn't make a lot of sense (besides privacy...)