More

MediaSquirrel · 2026-04-08T04:46:07 1775623567

yeah, it came out after I stared on my project last year. Only issue is that you can't fine-tune it on Apple Silicon.

MediaSquirrel · 2026-04-08T01:57:04 1775613424

More data -> better, faster on-device models

The actual plan was to distill Gemini 2.5 Pro into the best on-device voice dictation model.

Pretty sure it would have worked. Alas.

nomel · 2026-04-08T02:31:27 1775615487

Reasons for running local aside...

What is the practical latency difference you see between on-device and, say, whisper, in streaming mode, over the internet? Comparable? Seems that internet latency would be mostly negligible (assuming reasonable internet/cell coverage), or at least compensated for by the higher end hardware on the other side?

MediaSquirrel · 2026-04-08T04:43:16 1775623396

depends on the model!

If you run a smaller whisper-distil variant AND you optimize the decoder to run on Apple Neural Engine, you can get latency down to ~300ms without any backend infra.

The issue is that the smaller models tend to suck, which is why the fine-tuning is valuable.

My hypothesis is that you can distill a giant model like Gemini into a tiny distilled whisper model.

but it depends on the machina you are running, which is why local AI is a PITA.

MediaSquirrel · 2026-04-08T00:22:25 1775607745

re: Whisper v3 -- how is this possible? Whisper has a 30s context window. You have to chunk it.

LuxBennu · 2026-04-08T15:55:51 1775663751

Yeah sorry that was unclear on my part. I chunk at the endpoint level, whisper itself obviously processes 30s windows. The memory/latency thing I was referring to is more about processing longer files end to end through the pipeline, not a single whisper pass. My fastapi wrapper just splits the audio and runs chunks sequentially so total wall time scales linearly with file length, nothing fancy.

sipjca · 2026-04-08T09:57:08 1775642228

Wondering similar. It certainly can run beyond 30 seconds but at some point I believe the output should degrade

Plus you could do actual batch inference instead. Or if you must carry forward the context you could still do it linearly, but the mem usage shouldn’t just explode

MediaSquirrel · 2026-04-08T00:06:40 1775606800

Great minds think alike!

Also, I had a huge head start, as I spent a month or two working on this in September 2025, shelved it and dusted it back off this weekend.

weitendorf · 2026-04-08T00:30:23 1775608223

Excellent work still, your repo is much more robust and fleshed out and I am just beelining straight to audio LoRa not really knowing what I'm doing, as this is my first time attempting a ~real ML training project.

I think in https://github.com/mattmireles/gemma-tuner-multimodal/blob/m... and https://github.com/mattmireles/gemma-tuner-multimodal/blob/m... and https://github.com/mattmireles/gemma-tuner-multimodal/blob/m... you have a superset of the various cludges I have in my finetuning repo, I'm going to study this and do what I can to learn from it. Really appreciate you sharing it here!

Definitely interested in swapping notes if you are though. Probably the biggest thing that came out of this exercise for us was realizing that Apple actually has some really powerful local inference/data processing tools available locally, they just are much more marketed towards application developers so a lot of them fly under the radar.

We just published https://github.com/accretional/macos-vision to make it easy for anybody to use Apple's local OCR, image segmentation, foreground-masking, facial analysis, classification, and video tracking functionality accessible via CLI and hopefully more commonly in ML and data workloads. Hopefully you or someone else can get some use of it. I definitely will from yours!

MediaSquirrel · 2026-04-08T02:04:59 1775613899

Look inside here: https://github.com/mattmireles/gemma-tuner-multimodal/tree/m...

Here’s the trick: use Gemini Pro deep research to create “Advanced Hacker’s Field Guide for X” where X is the problem that you are trying to solve. Ask for all the known issues, common bugs, unintuitive patterns, etc. Get very detailed if you want.

Then feed that to Claude / Codex / Cursor. Basically, create a cheat sheet for your AI agents.

This will unlock a whole new level of capability.

I’m @mattmireles on Twitter — feel free to DM me.

MediaSquirrel · 2026-04-07T20:30:53 1775593853

you are welcome! It was a fun side quest

MediaSquirrel · 2026-04-07T20:30:32 1775593832

Memory usage increases quadratically with sequence length. Therefore, using shorter sequences during fine-tuning can prevent memory explosions. On my 64GB RAM machine, I'm limited to input sequences of about 2,000 tokens, considering my average output for the fine-tuning task is around 1,000 tokens (~3k tokens total).

LuxBennu · 2026-04-07T22:35:24 1775601324

Ah that makes sense, quadratic scaling is brutal. So with 96gb i'd probably get somewhere around 4-5k total sequence length before hitting the wall, which is still pretty limiting for anything multimodal. Do you do any gradient checkpointing or is that not worth the speed tradeoff at these sizes?

MediaSquirrel · 2026-04-08T00:05:01 1775606701

Haven’t tried yet. That’s on the do list. But good suggestion.

zozbot234 · 2026-04-08T02:00:59 1775613659

Shouldn't FlashAttention address the quadratic increase in memory footprint wrt. fine-tuning/training? I'm also pretty sure that it does not apply to pure inference due to how KV-caching works.

MediaSquirrel · 2026-03-28T21:09:35 1774732175

Here's the gist of the paper for anyone interested: https://gist.is/science.org/en/VdSDF9qjxbH8

MediaSquirrel · 2026-03-07T05:04:52 1772859892

Nukes gave us peace and freedom.

We've had no WW3 (so far) and no one here needs to worry about being drafted into a war. Gatling might have thought his gun would reduce the number of war fatalities, but but Oppenheimer thought he would end the world. Both were wrong.

Alternative take: Inventors are bad at predicting the downstream societal effects of their inventions.

treebeard901 · 2026-03-07T05:13:58 1772860438

Let's assume a nuclear exchange happens at some point during a war. There is a very high chance that this will cause an escalation leading to a nuclear apocalypse.

Since this result is presumably inevitable at increasing frequency, it's more like nukes prevented another major world war and stole a form of peace from the future, temporarily. That peace debt might be repaid with the end of everything.

KronisLV · 2026-03-07T06:34:40 1772865280

https://en.wikipedia.org/wiki/Nuclear_close_calls

nehal3m · 2026-03-07T08:12:31 1772871151

Funny how the unintentional close calls become more sparse with time. I wonder if that’s because humanity got better at dealing with the responsibility or because the oopsies haven’t been declassified yet.

testaccount28 · 2026-03-07T05:54:32 1772862872

let's assume the trees rise up and set fire to the ionosphere.

ares623 · 2026-03-07T05:38:46 1772861926

well whatever society is left will definitely be "peaceful" for at least a couple of decades.

wat10000 · 2026-03-07T06:15:43 1772864143

Nuclear weapons traded a high probability of a major war for a low probability of an apocalyptic war.

My question is, how low is that probability, exactly? Because the tradeoff looks very different if it’s one in a million per year, versus one in a hundred per year.

My assessment, looking at the history and the close calls, is that it’s more like one in a hundred.

9dev · 2026-03-07T07:17:30 1772867850

It certainly rises if the USA votes for an irresponsible crook.

jiehong · 2026-03-07T05:07:16 1772860036

It very much depends on where "here" is.

At least, it gives impunity to attack others with less fear of retaliation…

imjonse · 2026-03-07T07:45:02 1772869502

> no one here needs to worry about being drafted into a war.

here meaning the US or HN?

zabzonk · 2026-03-07T05:26:54 1772861214

> no one here needs to worry about being drafted into a war

Lots of talk in the UK recently about conscription.

imjonse · 2026-03-07T07:46:22 1772869582

Croatia:

https://www.reuters.com/business/aerospace-defense/croatia-r...

awjlogan · 2026-03-07T07:09:10 1772867350

I haven’t heard a peep about conscription, can you provide a source? There was some vague national service proposal for school leavers a couple of years ago, but that was it.

zabzonk · 2026-03-07T07:21:10 1772868070

Among many others: https://www.express.co.uk/news/uk/2178935/uk-issued-conscrip...

awjlogan · 2026-03-07T17:35:58 1772904958

One person in the Lords raising the issue in no way constitutes widespread calls for conscription.

watwut · 2026-03-07T10:58:19 1772881099

Germany is already working on it. Austria made it longer.

expedition32 · 2026-03-07T11:40:16 1772883616

Nuclear PROLIFERATION gave us peace and freedom.

The Americans wanted to keep it all to themselves you know...

MediaSquirrel · 2025-09-12T01:01:12 1757638872

Your bar is higher than mine.

Unlike Apple's formal "developer evangelist" and several others I contacted, the guy actually took the time to talk to us, and I was/am grateful for that. He's a cog in a very large corporate machine. Apple is Apple. He's not the CEO. He was doing his job and did me a favor. I am grateful to him.

MediaSquirrel · 2025-09-11T21:18:16 1757625496

> Why would someone say "You're a mosquito. Apple will just stomp on you and you will not exist.", it makes zero sense to me given the context laid out here.

I'm telling you what I was told. It's a true story. I was there. It happened to me.

Why would I make up a detail like that?