Hacker Newsnew | past | comments | ask | show | jobs | submit | randomgermanguy's commentslogin

Definitely something in this realm, they call the models "preview" at a bunch of different points in the paper.

What im really hoping is for a double-punch like with V3 -> R1


Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car...


The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides.


Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat"

It is like car vs. kick scooter.


It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful.


OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right.


I mean its almost halve a year, i think that counts ?


Time wise you are correct.


> "can I get my coding work actually done today" vs. "this can do customer support chat"

I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ?

I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...)


This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there.


Copilot is a bad harness that perverts the productivity of models like GPT 5.5.


Tell me more please!


Not really. The current SOTAs are already at the point that they can do that. The following models will start to surpass the daily work level. It's a diminishing returns situation just like anything else in tech.


If you found a rare 9000 card with 200+ GB of VRAM, sure


I think the general question is if they'll release it at all, haven't yet read anything stating that they would


Well let me introduce people to a few brand new concepts:

https://en.wikipedia.org/wiki/Capitalism

https://en.wikipedia.org/wiki/Race_to_the_bottom

https://en.wikipedia.org/wiki/Arms_race

Of course they'll release it once they can de-risk it sufficently and/or a competitor gets close enough on their tail, whichever comes first.


One can see the impact of this cultural-wave on people above ~40 pretty heavily.

Hand-in-hand with the whole "Atomkraft ? Nein Danke" campaign. (https://en.wikipedia.org/wiki/Nuclear_Power%3F_No_Thanks)


The major selling point of the tinyboxes is that you're able to run them in your office without any hassle.

I used to own a Dell Poweredge for my home-office, but those fans even on minimal setting kept me up at night


Yes, but in practice land-ownership is only zero sum in places like Europe where every square-kilometer has 300 years of documented ownership etc, or other high-density areas.

The Asia, Africa & the Americas have so much unused space that isn't as inhospitable as central Australia


Where in Asia do you have in mind? A few things I know off hand. Sri Lanka has a higher population density than Britain, Japan's is much higher than that, and Java has nearly the population of Russia in an area smaller than England (just England, not Britain or the UK). India and China are big, but have huge populations.

There is lots of "unused space" in places like Alaska or Siberia or deserts or mountains, but land is not a fungible commodity. Unused space is unused for a reason. In practice, almost all ownership of land is a zero sum game.


I think the author might argue, that simply becoming more efficient at creating a rent-seeking mechanism is not beneficial. No matter how well motivated you are to improve your zero-sum game skills, it's still zero-sum.

Or something like that.


You can already buy A100/H100s on eBay. While it might not ever be economical to run these at home (cost of electricity), but it's plenty fun.


Cool idea, but kinda sad that it has to go through a cloud-provider. I feel like there's a possibility with an accelerator-board (Coral TPU or something), to make this into a totally local thing maybe? The longer-waiting time is surely not an issue when considering how many people still use Polaroids.


We were looking to add on-device styles with the Raspberry Pi in order to keep the device cost low, though a Coral TPU would make this easier. The OnyxStream library appears to be able to do SD1.5 generation in 10 minutes on a Pi Zero, so with some optimization and reducing image resolution img2img may be possible on the Pi in ~1 minute. We were also looking at style transfer models, which are much more lightweight and could run fast on a Pi (https://github.com/tyui592/AdaIN_Pytorch/tree/master). Eventually our goal is to make this both on-device and relatively cheap.


We were looking into OnnxStream (https://github.com/vitoplantamura/OnnxStream) and modifying it to support img2img. We got pretty close but yeah capability of running diffusion models on a Raspi are quite limited lol.

Alternatively we could use compute from your iPhone, but it adds additional dependencies to external hardware that I don't quite like. We could use a Jetson, but then power draw is quite high. I agree with you that on-device inference is the holy grail, but figuring out the best approach is something we are still trying to figure out.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: