The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides.
It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful.
> "can I get my coding work actually done today" vs. "this can do customer support chat"
I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ?
This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there.
Not really. The current SOTAs are already at the point that they can do that. The following models will start to surpass the daily work level. It's a diminishing returns situation just like anything else in tech.
Yes, but in practice land-ownership is only zero sum in places like Europe where every square-kilometer has 300 years of documented ownership etc, or other high-density areas.
The Asia, Africa & the Americas have so much unused space that isn't as inhospitable as central Australia
Where in Asia do you have in mind? A few things I know off hand. Sri Lanka has a higher population density than Britain, Japan's is much higher than that, and Java has nearly the population of Russia in an area smaller than England (just England, not Britain or the UK). India and China are big, but have huge populations.
There is lots of "unused space" in places like Alaska or Siberia or deserts or mountains, but land is not a fungible commodity. Unused space is unused for a reason. In practice, almost all ownership of land is a zero sum game.
I think the author might argue, that simply becoming more efficient at creating a rent-seeking mechanism is not beneficial. No matter how well motivated you are to improve your zero-sum game skills, it's still zero-sum.
Cool idea, but kinda sad that it has to go through a cloud-provider. I feel like there's a possibility with an accelerator-board (Coral TPU or something), to make this into a totally local thing maybe?
The longer-waiting time is surely not an issue when considering how many people still use Polaroids.
We were looking to add on-device styles with the Raspberry Pi in order to keep the device cost low, though a Coral TPU would make this easier. The OnyxStream library appears to be able to do SD1.5 generation in 10 minutes on a Pi Zero, so with some optimization and reducing image resolution img2img may be possible on the Pi in ~1 minute. We were also looking at style transfer models, which are much more lightweight and could run fast on a Pi (https://github.com/tyui592/AdaIN_Pytorch/tree/master). Eventually our goal is to make this both on-device and relatively cheap.
We were looking into OnnxStream (https://github.com/vitoplantamura/OnnxStream) and modifying it to support img2img. We got pretty close but yeah capability of running diffusion models on a Raspi are quite limited lol.
Alternatively we could use compute from your iPhone, but it adds additional dependencies to external hardware that I don't quite like. We could use a Jetson, but then power draw is quite high. I agree with you that on-device inference is the holy grail, but figuring out the best approach is something we are still trying to figure out.
What im really hoping is for a double-punch like with V3 -> R1