Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Online, R1 costs what, $2/MTok?

This rig does >4 tok/s, which is ~15-20 ktok/hr, or $0.04/hr when purchased through a provider.

You're probably spending $0.20/hr on power (1 kW) alone.

Cool achievement, but to me it doesn't make a lot of sense (besides privacy...)



> Cool achievement, but to me it doesn't make a lot of sense (besides privacy...)

I would argue that is enough and that this is awesome. It was a long time ago I wanted to do a tech hack like this much.


Well thinking about it a bit more, it would be so cool if you could

A) somehow continuously interact with the running model, ambient-computing style. Say have the thing observe you as you work, letting it store memories.

B) allowing it to process those memories when it chooses to/whenever it's not getting any external input/when it is "sleeping" and

C) (this is probably very difficult) have it change it's own weights somehow due to whatever it does in A+B.

THAT, in a privacy friendly self-hosted package, i'd pay serious money for


I imagine it could solve crimes if it watched millions of hours of security footage…scary thought. Possibly it could arrest us before we even commit a crime through prediction like that black mirror episode.


Oh, you're thinking of "Hated in the Nation"? More relevant would possibly be "Minority Report" (set in 2054) and Hulu's "Class of '09", in which the FBI starts deploying a crime prediction AI in their version of 2025.

Quite scary. As the meme has it, it seems that we're getting ready to create the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus.


> doesn't make a lot of sense (besides privacy...)

Privacy is worth very much though.


What privacy benefit do you get running this locally vs renting a baremetal GPU and running it there?

Wouldn't that be much more cost-effective?

Especially when you inevitably want to run a better / different model in the near future that would benefit from different hardware?

You can get similar Tok/sec on a single RTX 4090 - which you can rent for <$1/hr.


But at a totally different quant, you're crazy if you think you can run the entire R1 model on a single 4090, come on man. Apples and oranges.


Definitely but when you can run this in places like Azure with tight contracts it makes little sense except for the ultra paranoid.


Considering the power of three letter agencies in the USA and the complete unhingedness of the new administration, I would not trust anything to a contract.


Sure I am certain there is a possibility but unless you have airgapped your local instance and locked down your local network securely it does not really matter.

It’s cool to run things locally and it will get better as time goes on but for most use cases I don’t find it worth it. Everyone is different and folks that enjoy the idea of local network secure can run it locally.


Even a badly operated on-prem system has the advantage that if someone breaks in, they are taking a risk of getting caught. Whereas with Azure the TLAs could just hoover up everything there without high risk of customers finding out (assuming they can gag MS). Given the reporting about NSA's "collect everything" modus operandi this doesn't seem very far fetched.


hmm do we still have to pretend that this is some sort of conspiracy theory? really? after snowden? it doesn't "seem very far fetched", its a fact


It's less "possibility" and more "certainty."


can we even trust the hardware?


The hardware can be airgapped.


Well you don't need to worry unless you are already on the list.


These days getting on a list may require as little as "is trans" or "has immigrant parents."


or "your competitor donated to Musk/Trump campaign"


?


no, just being speculative... about that.


[flagged]


What does that even mean? Shame these newer account post such low intelligence reaction replies.

For most use cases, you can consider a GCP/AWS/Azure secure.


They are eluding to it not being secure against state actors. The distrust in government isn’t novel to this discussion so it should come as no surprise on HN. There is also a general fear of censorship which should be held more toward the base model owners and not toward cloud providers. I still think doing this in the cloud makes more sense initially but I see the appeal for a home model that is decoupled from the wider ecosystem.


You could absolutely install 2kw of solar for probably around 2-4k and then at worst it turns your daytime usage into 0$. I also would be surprised if this was pulling 1kw in reality, I would want to see an actual measurement of what it is realistically pulling at the wall.

I believe it was an 850w PSU on the spec sheet?


Quick note that solar power doesn't have zero cost.


It could have zero marginal cost, right? In particular, if you over-provisioned your solar installation already anyway, most of the time it should be producing more energy than you need.


At that point you’re still paying the opportunity cost by losing out on selling your surplus.


And in winter, depending on the region, it might generate 0kW


Or, in my case, currently 32.9 W.


Don’t worry, you can charge an iPhone!


Marginal cost $0, 2kw solar + inverter + battery + install is worth more than this rig


No need for battery and battery is by far you largest cost. This could 100% just fallback to grid power, it's not backup power it's reducing usage.

Not sure about where you are but where I am a 2kW plus li-ion batteries is about 2months of the average salary here, not for tech, average salary, to put it into perspective converted to USD that is 1550 usd. Panels is maybe 20% of that cost, you can add 4kW of panels for 450 USD where I am.

So for less than the price of that PC I would be able to do 2kW of solar with li-ion batteries and overspecing panels by double. None of that cheaping out on components, can absolutely get lower than that if cheaping out. Installation will be maybe another 500-600 USD here, likely to be much higher depending on region. Also to put it into perspective we pay about 0.3 USD cents per kWh for electricity and this would pay for itself in between a year and two in savings.

By the time it needs to be replaced which is from 5-7 years on the stuff I just got pricing on it would have 100% offset the cost of running.

Again I am lucky and we effectively get 80-100% output year round even with cloud cover, you might be pretty far north and that doesn't apply.

TLDR: it depends but if you are in the right region and this setup generates even some income for you the cost to go solar is negative, it would actually not make financial sense to not do it, concidering a 2K USD box was in your budget.


I'm a big fan of solar and batteries. Sounds like you live in a very suitable place for it!


Privacy, for me, is a necessary feature for something like this.

And I think your math is off, $0.20 per kWh at 1 kW is is $145 a month. I pay $0.06 per kWh. I've got what, 7 or 8 computers running right now and my electric bill for that and everything else is around $100 a month, at least until I start using AC. I don't think the power usage of something like this would be significant enough for me to even shut it off when I wasn't using it.

Anyway, we'll find out, just ordered the motherboard.


> I pay $0.06 per kWh

That is like, insanely cheap. In Europe I'd expect prices between $0.15 - 0.25 per kWh. $0.06 sounds like you live next to some solar farm or large hydro installation? Is that a total price, with transfer?


So one thing, it's the winter rate, summer rate is higher when people run their AC and the energy company has to use higher cost sources. Second, it's a tiered rate, first x amount of kWh is a higher rate, then once you reach that amount it's a lower rate. But I'm already above the tier cutoff every month no matter what, so marginal winter rate is around $0.06.


Depends on where you live. The average in San Francisco is $0.29 per kWh.


This gets you the (arguably) most powerful AI in the world running completely privately, under your control, in around $2000. There are many use cases for when you wouldn't want to send your prompts and data to a 3rd party. A lot of businesses have a data export policy where you are just not allowed to use company data anywhere but internal services. This is actually insanely useful.


How is it that cloud LLMs can be so much cheaper? Especially given that local compute, RAM, and storage are often orders of magnitude cheaper than cloud.

Is it possible that this is an AI bubble subsidy where we are actually getting it below cost?

Of course for conventional compute cloud markup is ludicrous, so maybe this is just cloud economy of scale with a much smaller markup.


My guess is two things:

1. Economies of scale. Cloud providers are using clusters in the tens of thousands of GPUs. I think they are able to run inference much more efficiently than you would be able to in a single cluster just built for your needs.

2. As you mentioned, they are selling at a loss. OpenAI is hugely unprofitable, and they reportedly lose money on every query.


The purchase price for a H100 is dramatically lower when you buy a few thousand at a time


I think batch processing of many requests is cheaper. As each layer of the model is loaded into cache, you can put through many prompts. Running it locally you don't have that benefit.


> Especially given that local compute, RAM, and storage are often orders of magnitude cheaper than cloud

He uses old, much less efficient GPUs.

He also did not select his living location based on the electricity prices, unlikely the cloud providers.


It's cheaper because you are unlikely to run your local AI at top capacity 24/7 so you have unused capacity which you are paying for.


The calculation shows it's cheaper even if you run local AI 24/7


They are specifically referring to usage of APIs where you just pay by the token, not by compute. In this case, you aren’t paying for capacity at all, just usage.


It is shared between users and better utilized and optimized.


"Sharing between users" doesn't make it cheaper. It makes it more expensive due to the inherent inefficiencies of switching user contexts. (Unless your sales people are doing some underhanded market segmentation trickery, of course.)


No, batched inference can work very well. Depending on architecture, you can get 100x or even more tokens out of the system if you feed it multiple requests in parallel.


Couldn't you do this locally just the same?

Of course that doesn't map well to an individual chatting with a chat bot. It does map well to something like "hey, laptop, summarize these 10,000 documents."


Yes, and people do that. Some people get thousands of tokens per second that way, with affordable setups (eg 4x 3090). I was addressing GP who said there is no economies of scale to having multiple users.


Isn't that just because they can get massive discounts on hardware buying in bulk (for lack of a proper term) + absorb losses?


All that, but also because they have those GPUs with crazy amounts of RAM and crazy bandwidth? So the TPS is that much higher, but in terms of power, I guess those boards run in the same ballpark of power used by consumer GPUs?


How would it use 1kW? Socket SP3 tops at 280W and the system in the article has a 850W PSU so I'm not sure what I'm missing.


I assume that the parent just rounded 850W up to 1kW, no?


Yeah i was vigorously waving hands. Even at 200W, 10 cents/kWh you'd need to run this a LONG time to break even


The point is running locally, not efficiently


> You're probably spending $0.20/hr on power (1 kW) alone.

For those that aren't following - means you're spending ~$10/MTok on power alone (compared to $2/MTok hosted).


"besides privacy"

lol.

Yeah, just besides that one little thing. We really are a beaten down society aren't we.


Most people value privacy, but they’re practical about it.

The odds of a cloud server leaking my information is non-zero, but it’s very small. A government entity could theoretically get to it, but they would be bored to tears because I have nothing of interest to them. So practically speaking, the threat surface of cloud hosting is an acceptable tradeoff for the speed and ease of use.

Running things at home is fun, but the hosted solutions are so much faster when you actually want to get work done. If you’re doing some secret sensitive work or have contract obligations then I could understand running it locally. For most people, trying to secure your LLM interactions from the government isn’t a priority because the government isn’t even interested.

Legally, the government could come and take your home server too. People like to have fantasies about destroying the server during a raid or encrypting things, but practically speaking they’ll get to it or lock you up if they want it.


What about privacy from enriching other entities through contributions to their models, with thoughts concieved from your own mind? A non-standard way of thinking about privacy, sure. But I look forward to the ability to improve an offline model of my own with my own thoughts and intellect—rather than giving it away to OpenAI/Microsoft/Google/Apple/DeepSeek/whoever.


If the odds are so small, how come there are numerous password dumps? Your credentials may well be in them.


There is something about this comment that is so petty that I had to re-read it. Nice dunk, I guess.


Privacy is a relatively new concept, and the idea that individuals are entitled to complete privacy is a very new and radical concept.

I am as pro-privacy as they come, but let’s not pretend that government and corporate surveillance is some wild new thing that just appeared. Read Horace’s Satires for insight into how non-private private correspondence often was in Ancient Rome.


It's a bit of both. Village societies don't have a lot of privacy. But they also don't make it possible for powerful individuals to datamine personal information of millions.

Most of us have more privacy than 200 years ago in some ways, and much less privacy in other ways.


I think the main point of local model is privacy set aside hobby and tinkering.


I think the privacy should be the whole point. There's always a price to pay. I'm optimistic that soon you'll be able to get better speeds with less hardware.


> (besides privacy...)

that's the whole point of local models


The system idles at 60w and running hits 260w.


I think you may be underestimating future enshitification? (e.g. it's going to be trivially easy for the cloud suppliers to cram ads into all the chat responses at will).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: