Short version: A Qwen-2.5 7b model that has been turned into a diffusion model. ...

baobun · 2025-07-08T11:58:10 1751975890

Without having tried it, what I keep getting surprised with is how apparently widely different architectures (and in other cases training data) lead to very similar outcomes. I'd expect results to vary a lot more.

IMTDb · 2025-07-08T13:14:22 1751980462

I would expect a lot of attempts to fail and those tend to not be published, or gather less attention. So if we have reached a local optimum, any technique that gets close to the current benchmarks is worth publishing, as soon as results reach that point. All the one that are too distant are discarded. In the end all the paper you see are close to the current status quo.

It's possible that some of those new architecture / optimization would allow us to go beyond the current benchmark score, but probably with more training data, and money. But to get money you need to show results, which is what you see today. Scaling remains king; maybe one of these technique is 2025 "attention" paper, but even that one needed a lot of scaling to go from the 2017 version to ChatGPT.

viraptor · 2025-07-08T12:50:39 1751979039

It doesn't look like it got pushed that much unfortunately. The article says they only added 20k examples to fine tune at the end, but maybe the ceiling is much higher for diffusion?

But yeah, RWKV also ends up in a similar performance area with similar sizes - I wish someone started using it at scale finally...

hnaccount_rng · 2025-07-08T15:02:01 1751986921

But if the limiting factor is the data on which the models are trained and not the actual “computation” than this would be exactly expected right?

Ldorigo · 2025-07-08T15:26:25 1751988385

The data might be the limiting factor of current transformer architectures, but there's no reason to believe it's a general limiting factor of any language model (e.g. humans brains are "trained" on orders of magnitude less data and still generally perform better than any model available today)

hnaccount_rng · 2025-07-08T16:20:26 1751991626

That depends on whether these current learning models can really generalise or whether they can only interpolate within their training set

miroljub · 2025-07-08T12:35:41 1751978141

When we look at the small models suitable for running locally, by far the best programming model is DeepSeek-R1-0528-Qwen3-8B. It is quite comparable in real world usage even to much bigger models.

hardwaresofton · 2025-07-08T15:37:30 1751989050

Would you mind sharing how you arrived at this conclusion? Was there some benchmark that it really shined at? Personal use?

miroljub · 2025-07-09T15:07:12 1752073632

Personal use, no benchmark, just a vibe.

handfuloflight · 2025-07-08T15:10:14 1751987414

Comparable to which bigger models?

miroljub · 2025-07-09T15:07:40 1752073660

My previous favourite was qwen2.5-coder.

roughly · 2025-07-08T15:40:47 1751989247

> A diffusion model comes with a lot of benefits in terms of parallelization and therefore speed; to my mind the architecture is a better fit for coding than strict left to right generation.

I had a similar notion and am excited to see this research being done. My experience of writing code is that the structure of the whole system influences each individual part, which has always felt like a better match for a diffusion type model.

I’m suspecting this is a 7B model because it’s an experiment, but I do like seeing Apple playing with smaller models - I think Google’s “no moat” memo is still fundamentally correct, either via better architectures or Moore’s law, and it seems like Apple thinks the same.

sitkack · 2025-07-09T01:09:27 1752023367

The "no moat" memo is way more complex than Google admitting an uncomfortable truth. The benefit massively from having seemingly internal documents leaked about how the play field is fair.

jeswin · 2025-07-08T11:07:50 1751972870

> to my mind the architecture is a better fit for coding

We have to see if it produces better results. Humans have a planning phase, followed be a part-by-part implementation phase. This is reasonably well emulated by plan/architect + codegen tools.

dboreham · 2025-07-08T13:11:01 1751980261

It's delusional to think that most software projects can be planned in advance beyond "there will be a beginning, a middle, and an end". People do it, but their efforts are in my experience generally ignored once implementation get underway.

Retric · 2025-07-08T13:58:48 1751983128

Planning in software isn’t about following the plan but mapping a viable route to avoid predictable issues. You’re always going to know more about a project as you build it and you should keep updating that plan.

lokar · 2025-07-08T15:42:36 1751989356

That’s true at the project level. But surely when you sit down to actually work for a couple hours you think about what you are going to do, and then mostly do that.

layer8 · 2025-07-08T16:27:27 1751992047

In my experience it’s more fractal. Any subgoal, however small, may run into its own planning/thinking and then doing sequence, or even have you reconsider the higher-level plan. Of course, it somewhat depends on how run-of-the-mill the overall task is.

handfuloflight · 2025-07-08T15:11:07 1751987467

laughs nervously under a waterfall

koakuma-chan · 2025-07-08T12:08:29 1751976509

> At some point these local models will get good enough for ‘real work’

Are these small models good enough for anything but autocomplete?

MangoToupe · 2025-07-08T12:25:44 1751977544

Given that's 99% of my usage of it, that alone would make me quite happy.

_heimdall · 2025-07-08T12:14:10 1751976850

Isn't that all they're designed for?

They predict more than just the second half of a word you are typing, but at the end of the day they're still just predicting what a human would have typed.

koakuma-chan · 2025-07-08T12:18:37 1751977117

I'm disappointed because I don't use autocomplete.

Eggpants · 2025-07-08T19:30:35 1752003035

Most of the "magic" of large models are really just function calls, so as long as the small models have access to the same functions they work well. They fixed the "how many R's in Strawberry" issue by offloading the question to a function, not spending a godly amount of money/energy on training another model.

Oops, sorry "Tools". Gotta maintain the grift these statistic based lossy text compression cool bar tricks are "thinking".

iwontberude · 2025-07-08T10:59:42 1751972382

I think Apple will ultimately destroy the data center, I hope they succeed.

lxgr · 2025-07-08T12:46:15 1751978775

Maybe for compute, but not for storage.

Why can’t I backup an iOS device to a local NAS in the way I can use Time Machine, for example? (Rhetorical question; the answer is obviously that they want to sell more iCloud storage for that all-important services revenue).

throw0101d · 2025-07-08T13:46:00 1751982360

> Why can’t I backup an iOS device to a local NAS in the way I can use Time Machine, for example?

When I connect my iPhone to my iMac it does to a local backup to a file, which then gets backed up via Time Machine (and SuperDuper/CarbonCopyCloner).

"How to back up your iPhone, iPad, and iPod touch with your Mac":

* https://support.apple.com/en-ca/108796

There's also a checkbox for 'Wifi syncing' so a cable isn't necessarily needed.

lxgr · 2025-07-08T14:06:10 1751983570

That’s exactly my point: Why on Earth do I need a separate computer to mediate the backup?

iOS natively supports SMB over any network connection including wired Ethernet, mounting encrypted APFS volumes on USB storage devices at 10 Gbps etc.

It’s Apples explicit vision that an iPad Pro can replace a Mac even for some professional users. Why don’t these deserve local backups?

GeekyBear · 2025-07-08T14:40:12 1751985612

How many people own a NAS, but not a PC or Mac?

Apple already provides first party software to handle iDevice backups on Windows or Mac.

Backing up an Android device to a PC using adb is significantly more difficult, especially for the less technically minded.

lxgr · 2025-07-08T14:49:39 1751986179

> How many people own a NAS, but not a PC or Mac?

That’s arguably the wrong question: I bet a lot more would own one if they could easily backup their iOS devices to it.

hnaccount_rng · 2025-07-08T15:06:22 1751987182

The number of people that would but a NAS over just spending the 5$/month for storage is well below a percent and if you combine that with the requirement of not having a PC/Mac you may well end up in the hundreds…

There aren’t that many people that are willing to own a device from a company but not trusting that company with their data

lxgr · 2025-07-08T15:54:47 1751990087

Your numbers might be right, but Apple has implemented niche features, some even requiring expensive per-device hardware, for much less than that.

hnaccount_rng · 2025-07-08T16:19:24 1751991564

Do you have an example?

lxgr · 2025-07-08T17:23:37 1751995417

All new iPhone models support native DisplayPort output via USB-C, yet I’m not sure 1% of users even have the required cable/adapter.

Some of the power amplifiers for rarely-used bands probably qualify as well (mmWave in particular).

On the software side I’d have to dig a bit, but I bet many code paths on iOS see use of less than 1% of all users.

GeekyBear · 2025-07-08T15:10:41 1751987441

I'm willing to bet that more people would backup their Android device if Google provided a first party tool for user friendly backups of Android devices to local computers.

tonyedgecombe · 2025-07-08T13:31:47 1751981507

>Why can’t I backup an iOS device to a local NAS

You can backup your iPhone using Finder.

Finder -> Locations -> Your iPhone -> Backup all the data on this iPhone to your Mac.

Once you have done this you can find the backup in "Manage Backups", right click on an entry and select "Show in Finder". From there you can copy it to your NAS.

Not as smooth as a Time Machine backup but it is possible.

lxgr · 2025-07-08T14:07:28 1751983648

> Not as smooth as a Time Machine backup but it is possible

I’d personally call it “absurdly clunky and intentionally impractical for a big chunk of Apple’s user base”.

hiatus · 2025-07-08T12:59:02 1751979542

Synology supports exactly that, and I'm sure they're not the only one.

lxgr · 2025-07-08T14:08:27 1751983707

Full iOS backups directly to local external storage, without another computer in the mix? I’d be very surprised if that were true.

GeekyBear · 2025-07-08T14:28:55 1751984935

Here's one example of a third party tool.

> Step-by-Step Guide: How to Backup iPhone to Synology NAS

https://www.ubackup.com/phone-backup/backup-iphone-to-synolo...

lxgr · 2025-07-08T14:46:42 1751986002

> Preparation. How to set up Synology NAS on PC

That’s a guide on how to backup an iPhone to a NAS using a computer.

Unsurprisingly, a reasonably capable general-purpose OS supports network file systems in a way transparent to applications, but that doesn’t help people using only an iOS device.

oefrha · 2025-07-08T14:45:16 1751985916

Did you actually read what you linked, or did you just paste in a random link from a search engine?

There are two methods presented: one only backs up the camera roll; the other requires plugging into a computer and manually clicking around, at which point you might as well use the first party backup built into Finder (or iTunes on Windows? Is that still a thing?), no random third party application needed. I also highly doubt their “backup every single content” claim.

It’s also a sneaky marketing article for that third party application, following the common SEO practice of giving you a half-ass solution capturing a frequent search term (in this case, “backup iPhone to Synology”), then plug their own questionable thing as the better solution.

nxobject · 2025-07-08T11:01:51 1751972511

Shades of 1980s Apple v. Big Blue. I can't wait for the rehash of the "1984" ad.

overfeed · 2025-07-08T15:43:09 1751989389

> I think Apple will ultimately destroy the data center

I think EVs destroying Ultra Large Container ships had better odds, amd both are extremely unlikely. Dc advantages Apple won't be able to overcome: compute density, cooling, cheap power, physical security to protect the software, scale + bandwidth, lower costs to customers of using contract manufacturers and/or commodity hardware.

There is no universe where large enterprises ditch their geo-located racks. Let alone hyperscalers, especially now that they are scrounging for energy, reneging on pledges on renewables, and paying bug bucks to bring nuclear power stations online

iwontberude · 2025-07-09T18:27:34 1752085654

It’s easy to imagine a universe where the hyperscalers are in a bubble and they will eventually find a limit to adding classical compute and we will hit peak datacenter and shrink from there.

msgodel · 2025-07-08T14:40:39 1751985639

Not without fundamentally changing the way they think about computing and there seems to be zero willingness among their leadership to do that. In fact they seem to want to move things into the data center. That's why I'm shorting them.

iwontberude · 2025-07-08T15:24:54 1751988294

I think it’s just a convenient stepping stone more than a long term strategy.