"In some instances, AI data centres are powered completely by renewable energy. Unfortunately, unless the data centre builds new renewable energy sources to supply 100% of its power, this still results in an increase in fossil fuel usage. Why? Because if the data centre is using pre-existing sources for renewable energy, it is taking that energy away from other consumers that need it. To pick up the slack, we must generate more energy, and most of that generation is done using coal or natural gas."
-- strangely I had never actually thought about this
I get that these data center companies are in a race right now and speed is the name of the game. But if they really want long-term acceptance for this technology, I think they will need to basically be fully self-sufficient with renewable energy, in a completely isolated power grid, so that we don't have to deal with the above issue.
Does anyone know if there is another printer manufacturer that has an equivalent to the Bambu A1S with it's custom AMS system. I don't think people realize how good that printer and AMS system is (the AMS system for the X1C pales in comparison), and I'd love to support another company, but haven't really seen another bed slinger with the simple center-rotating AMS style system seen on the A1S AMS. For context I run a business where I sell 3d printed parts for old film cameras - and the A1S is a workhorse.
The main thing keeping me from making the multi-material jump is the waste. I have a couple Vorons and would love to be able to print with different materials at the same time, but the waste with the current solutions is so egregious.
I don't know what the A1S is (did you mean the A1 or the P1S?) but the Snapmaker U1 is on my wishlist. More expensive than either but eliminates the AMS waste by using multiple toolheads. Open firmware, most of Bambu's convenience features.
It's been a while and I can't recall the other big one. I know some engineers from one of them went on to work for Clone Robotics, which seems to be doing interesting stuff with other types of actuators.
I'm particularly interested in it being REALLY fast - do you have any rough tok/s numbers for the flash model? I'm excited for unsloth to drop some quants that I can try and run locally, but really curious how it's been performing speed wise. In general I actually over-index on speed over intelligence. I'd rather a model make mistakes quickly and correct in a follow-up than take forever to get a slightly better initial result.
Take a look at the Time column in https://gertlabs.com/?mode=oneshot_coding -- this is the total time to complete a solution for a reasonably complex problem end-to-end (you would have to divide by avg submission size to estimate tok/s). It's fast in the sense that most of the smart, recent Chinese releases are quite slow, especially the DeepSeek Pro variant. Opus 4.7 is also quite fast.
If pure speed is most important for your use case, GPT-5.3 Chat is the fastest model we've tested and it's still reasonably smart. Not meant for agentic tool usage / long context, though.
So it might be more useful for business applications or non-engineering usage where you don't need exceptional intelligence, but it's useful to get fast, cheap responses.
This black box approach that large frontier labs have adopted is going to drive people away. To change fundamental behavior like this without notifying them, and only retroactively explaining what happened, is the reason they will move to self-hosting their own models. You can't build pipelines, workflows and products on a base that is just randomly shifting beneath you.
In what ways has fetch never caught up to axios? I have not encountered a situation where I could not use fetch in the last 5 years so I'm just curious what killer features axios has that are still missing in fetch (I certainly remember using axios many moons ago).
Simple examples are interceptors and error handling.
Fetch is one of those things I keep trying to use, but then sorely regret doing so because it's a bit rubbish.
You're probably reinventing axios functionality, badly, in your code.
It's especially useful when you want consistent behaviour across a large codebase, say you want to detect 401s from your API and redirect to a login page. But you don't want to write that on every page.
Now you can do monkey patching shenanigans, or make your own version of fetch like myCompanyFetch and enforce everyone uses it in your linter, or some other rubbish solution.
Or you can just use axios and an interceptor. Clean, elegant.
And every project gets to a size where you need that functionality, or it was a toy and who cares what you use.
ourFetch is more likely to be buggy, unmaintained, undocumented and nobody knows it well because the guy who wrote it left the org 2 years ago and so you have to waste time reading and maintaining it yourself.
Axios is something where you get most of that work done for you by the community for free, and a lot of people know it. As long as you don’t get pwned due to it. Oh and you will actually find community packages that integrate with it, vs ourFetch, which again, nobody knows or even cares that it exists.
Applies to web frameworks, databases and other types of software and dependencies - if you work with brilliant people, you might succeed rolling your own, but for most people taking something battle tested, something off the shelf is a pretty sane way to go about it.
In this case it’s a relatively small dependency so it’s not the end of the world, but it’s the exact same principle.
I think the ideal model would be being able to depend on upstream code, but being able to review ALL of the actual code changes when pulling in new dependency versions (with a nice UI) and being able to vendor things and branch off with a single command whenever you need it, so you don't have to maintain it yourself by default but it's trivial when you want to.
It's actually surprising that in regards to front end development the whole shadcn approach hasn't gotten more popular. Or anywhere else for that matter, focusing on making code way more easy to maintain, to compile/deploy, with less complexity along the way.
It's the difference between using a SQL library and some person on your team writing their own SQL library and everyone having to use it. There's a vast gulf between the two, professionally speaking.
People dissing axios probably suffer from other NIH problems too.
It's interesting that, of the large inference providers, Google has one of the most inconvenient policies around model deprecation. They deprecate models exactly 1 year after releasing them and force you to move onto their next generation of models. I had assumed, because they are using their own silicon, that they would actually be able to offer better stability, but the opposite seems to be true. Their rate limiting is also much stricter than OpenAI for example. I wonder how much of this is related to these TPU's, vs just strange policy decisions.
It's frustrating how cavalier they are about killing old Gemini releases. My read is that once a new model is serving >90% of volume, which happens pretty quickly as most tools will just run the latest+greatest model, the standard Google cost/benefit analysis is applied and the old thing is unceremoniously switched off. It's actually surprising that they recently extended the EOL date for Gemini 2.5. Google has never been a particularly customer-obsessed company...
Consistency, new models don't behave the same on every task as their predecessors. So you end up building pipelines that rely on specific behavior, but now you find that the new model performs worse with regards to a specific task you were performing, or just behaves differently and needs prompt adjustments. They also can fundamentally change the default model settings during new releases, for example Gemini 2.5 models had completely different behavior with regards to temperature settings than previous models. It just creates a moving target that you constantly have to adjust and rework instead of providing a platform that you and by extension your users can rely on. Other providers have much longer deprecation windows, so they must at least understand this frustration.
> Consistency, new models don't behave the same on every task as their predecessors. So you end up building pipelines that rely on specific behavior
If this is a deal breaker, then self-hosting is the only solution. Due to the hardware premium, all models hosted by 3rd-parties will be deprecated to make room for newer, better, and more efficient models.
Sure, but Google also leaves little to no overlap between models and often will leave models in preview mode (which many companies cannot use in production for legal reasons) - right up until the point that the previous model is deprecated.
The point is that if you want to build a platform that customers can rely on based on their own schedules of feature development, you need to support models for longer periods of time. For example, OpenAI is still offering older models like gpt4 which was released in 2023 - this gives customers plenty of time to test, experiment and eventually migrate to a newer model if it makes sense.
If you're trying to run repeatable workflows, stability from not changing the model can outweigh the benefits of a smarter new model.
The cost can also change dramatically: on top of the higher token costs for Gemini Pro ($1.25/mtok input for 2.5 versus $2/mtok input for 3.1), the newer release also tokenizes images and PDF pages less efficiently by default (>2x token usage per image/page) so you end up paying much much more per request on the newer model.
These are somewhat niche concerns that don't apply to most chat or agentic coding use cases, but they're very real and account for some portion of the traffic that still flows to older Gemini releases.
I use LMStudio to host and run GLM 4.7 Flash as a coding agent. I use it with the Pi coding agent, but also use it with the Zed editor agent integrations. I've used the Qwen models in the past, but have consistently come back to GLM 4.7 because of its capabilities. I often use Qwen or Gemma models for their vision capabilities. For example, I often will finish ML training runs, take a photo of the graphs and visualizations of the run metrics and ask the model to tell me things I might look at tweaking to improve subsequent training runs. Qwen 3.5 0.8b is pretty awesome for really small and quick vision tasks like "Give me a JSON representation of the cards on this page".
I'm crossing my fingers they release a flash version of this. GLM 4.7 Flash is the main model I use locally for agentic coding work, it's pretty incredible. Didn't find anything in the release about it - but hoping it's on the horizon.
-- strangely I had never actually thought about this
reply