More

Aeroi · 2026-05-14T14:57:58 1778770678

i'm working on something tangenially with cloud coding agents to bring the workflow to mobile. the breakthrough for me was realizing that the IDE isn't needed anymore and cloud repos + sandboxes open up the ability to continue working from anywhere. mouse.dev

Aeroi · 2026-05-09T02:43:27 1778294607

I run the gemini live api over a mesh hosted managed webrtc cloud. works fantastic, and Ive been running it for 2 years. you can try websocket, handle ephemeral keys, ect ect. but when you speak with people running voice agents at scale in this space, many of the issues are solved with webRTC and pipecat and the many resources allocated to solved problems in this space. It certainly feels overkill, and it probably is, but once connection is established, it's pretty magical. the startup time and buffering has been solved for quicker voice connections too, https://github.com/pipecat-ai/pipecat-examples/tree/main/ins... (video is harder)

Aeroi · 2026-05-09T02:36:45 1778294205

there are a lot of extremely smart people that have come back to webRTC time and time again because it continues to solve problems other methods and protocols can't. with saying that, quic is certainly interesting going forward, but i primarily stream voice + vision at 1fps so it just makes sense, and websockets fail and are insecure at scale for this use case (see https://www.daily.co/videosaurus/websockets-and-webrtc/) . also just listen to sean in this thread, dude knows whats up.

Aeroi · 2026-05-08T02:18:43 1778206723

Panthalassa

Ocean-2, a full-scale prototype deployed off the Washington coast, built by the Panthalassa team of inventors, programmers, welders, physicists, and engineers.

Aeroi · 2026-05-08T01:58:57 1778205537

1. human verification for auth.

2. only human generated input composer, no copy/paste, no file uploads ect. control the composer. control the camera sessions for photos videos.

3. no algorithmic feed that is designed for ad-spend and eyeballs.

4. moderate

maltalex · 2026-05-08T02:47:38 1778208458

> 1. human verification for auth.

How, at scale?

Aeroi · 2026-05-07T19:21:44 1778181704

You're absolutely right!

graypegg · 2026-05-07T19:33:04 1778182384

I've found the smoking gun ⸻ it's not your work, it's your prompt.

dfxm12 · 2026-05-07T19:42:31 1778182951

I've seen en dashes. I've seen em dashes. What kind of dash is that?!

graypegg · 2026-05-07T19:54:36 1778183676

It's been a personal favourite of mine to sprinkle into replies to clearly LLM generated textual diarrhea, it scores a laugh like, 1/10 times haha.

dantillberg · 2026-05-07T19:50:43 1778183443

A three-em dash. TIL.

whackernews · 2026-05-07T23:19:24 1778195964

Mmm dash.

AuthAuth · 2026-05-08T03:11:06 1778209866

Thats the new Copilot™ Dash from Microsoft

archagon · 2026-05-08T00:17:40 1778199460

I believe it’s called the chungus.

HerbManic · 2026-05-08T06:42:23 1778222543

I'm glad we could get that cleared up.

Aeroi · 2026-05-06T16:00:01 1778083201

Before starting CNN, Ted Turner captained the sailing Yacht Courageous to an America's Cup victor 4-0 over the Australians in Newport, RI during what was arguably sailings hay day.

Aeroi · 2026-05-06T16:03:08 1778083388

great watch on his accomplishments: https://www.youtube.com/watch?v=qknXQgqIjzI

stego-tech · 2026-05-06T17:32:19 1778088739

Oh my god I finally get a very specific Harvey Birdman joke as a result of this factoid. Fuck me, Phil Ken Sebben as a parody of Ted Turner kinda works.

openuntil3am · 2026-05-06T22:37:15 1778107035

Incidentally, Turner created Cartoon Network.

stego-tech · 2026-05-07T01:25:46 1778117146

That I was aware of. I'm more familiar with his media and wildlife conservation efforts than his business acumen or sports achievements. Captain Planet, Turner Classic Movies, Hanna-Barbera, Cartoon Network, etc.

Aeroi · 2026-05-04T20:08:25 1777925305

if anyone is looking to get into this. pipecat is a great open-source repo and community. https://github.com/pipecat-ai/pipecat

pncnmnp · 2026-05-04T20:16:18 1777925778

I wish I had known about Pipecat a lot sooner. I found out about it a few weeks back, and since Gemma 4 launched, I've been building my own entirely local voice assistant using Gemma 4 + Kokoro TTS + Whisper from scratch - https://github.com/pncnmnp/strawberry.

Pipecat's smart turn model is really good for VAD - https://huggingface.co/pipecat-ai/smart-turn-v3

zarldev · 2026-05-04T21:54:07 1777931647

Yeah Gemma4 was and is great fun to do this with - I too am building pretty much the same as yourself in Go.

https://github.com/zarldev/zarl & https://www.zarl.dev/posts/hal-by-any-other-name

jaggederest · 2026-05-04T23:06:33 1777935993

Looks like everyone is building one of these, I have my own little version that's using streaming STT, it can actually be too fast in some cases, and I have a little ring buffer grabbing audio from before the wake word detection fires (so it can hear "Hey Jarvis, turn on the lights" without deliberate pause) https://github.com/jaggederest/pronghorn/

AnthOlei · 2026-05-04T20:27:36 1777926456

What do you have going on the hardware side? I want to plug this into hass but don’t know what hardware I need for reasonable latency

pncnmnp · 2026-05-04T21:25:02 1777929902

The whole setup works on my M2 MacBook Pro with 16 GB RAM. I use Gemma 4B via LiteRT-LM.

I've found that LiteRT-LM has a much lower DRAM footprint than Ollama. I've also made tons of optimizations in the code - for eg, you can do quite a bit with a 16k context window for a voice assistant while managing a good footprint, so I keep track of the token usage and then perform an auto-compaction after a while. I use sub-agents and only do deep-think calls with them, so the context window is separated out. In a multi-turn conversation, if Gemma 4 directly processes audio input, the KV cache fills up within a few turns, so I channel it all via Whisper.

Also, by far the biggest optimization is: 3-stage producer-consumer architecture. The LiteRT-LM streams tokens and I split them into sentences. A synthesizer thread then converts each sentence to audio via Kokoro TTS - the main thread then plays audio chunks sequentially. There's a parallel barge-in monitor thread. https://github.com/pncnmnp/strawberry/blob/main/main.py#L446

I did not want to use openWakeWord or Picovoice because they had limitations on which wake word you could choose. Alternative was to train a model of my own. So I created my own wake word detection pipeline using Whisper Tiny - works surprisingly well: https://github.com/pncnmnp/strawberry/blob/main/main.py#L143...

Also, I have VAD going with smart turn v3 (like I mentioned above) + I use browser/websocket for AEC + Barge-in (https://github.com/pncnmnp/strawberry/blob/main/audio_ws.py).

I'm using the MacBook's built-in microphones for this, though, and I haven't fully tested it with other microphones. I've been ironing out the rough edges on a daily basis. I should write a quick blog on this too.

Sean-Der · 2026-05-04T20:55:34 1777928134

Check out [0]. You can do 'Voice AI' on small/cheap hardware. It's the most fun you can have in the space ATM :) It's been a while, but posted a demo here [1]

[0] https://github.com/pipecat-ai/pipecat-esp32

[1] https://www.youtube.com/watch?v=6f0sUEUuruw

AnthOlei · 2026-05-04T21:16:30 1777929390

beautiful demo - is it running fully locally or talking to 3rd party API’s? That box was jaw dropping small

jameshush · 2026-05-05T01:12:58 1777943578

For the best experience, you'll still want it to communicate with 3rd party APIs to handle the speech to text, text to speech, and LLM.

pk-voice · 2026-05-05T10:44:54 1777977894

If you like Pipecat’s focus on speed, you might also try out our open source, which comes with all the batteries included (knowledge base, telephony/SIP, variables, BYOK any LLM STT TTS, Speech to Speech, etc )

And it's fully OSS- like n8n for voice AI, and you can use it with OpenClaw or Claude code - recently launched MCPs.Github- https://github.com/dograh-hq/dograh, Youtube -https://www.youtube.com/watch?v=sxiSp4JXqws&list=PLDqzGuN7B1...

BoxedEmpathy · 2026-05-04T20:10:19 1777925419

I've been looking at this! Great project.

Aeroi · 2026-05-03T23:55:09 1777852509

agreed. OpenCode is a strong base, and with a couple modifications it can become a very effective harness. my sideproject mouse.dev I’ve been combining parts from OpenCode, Claude Code, and Hermes to build a cloud agent architecture that works well from mobile.

CharlesW · 2026-05-04T00:24:35 1777854275

> OpenCode is a strong base, and with a couple modifications it can become a very effective harness.

I personally didn't find it to be competitve with Claude Code as a harness. Can I ask how you modified it to perform better?

Aeroi · 2026-05-04T00:52:01 1777855921

I haven’t run formal evals but i improved the experience for my own needs and it feels noticeably better with these modifications.

-Claude-style subagents -an MCP layer for higher-level tools -Cursor-style control plane modes like Ask, Plan, Debug, and Build.

The MCP layer lets the harness use things like GitHub file/code read, PR creation, web search/fetch, structured user questions, plan-mode switching, user skills, and subagents.

So the improvement is mostly from better ui/ux orchestration and tool access. There's some things from hermes that are interesting as well.

Most of my focus has been on applying this stack to sandboxed cloud agents so you can properly code and work from mobile devices.

I can't definitively say that the stack is better or worse than Claude code, more just tuned for my use case I guess.

eloisant · 2026-05-04T17:07:14 1777914434

What issues do you have with OpenCode?

Personally I use it for the TUI, it's way better than Claude Code's one.

ryanlitalien · 2026-05-04T17:55:45 1777917345

Kudos! Cool idea, I'm on the same path you are, yet you're just one step ahead. For mouse.dev, what are you using for the cloud agent sandbox piece? I haven't moved my agents to the cloud yet (for on the go mobile enablement). Would Islo be a competitor to mouse?

https://islo.dev/

https://www.incredibuild.com/blog/why-we-built-islo-ai-codin...

Aeroi · 2026-05-04T18:25:44 1777919144

cool! I've been mostly building for what coding from an iphone can look like. the cloud agent sandbox portion is definitely not polished yet but working well so far. i looked at daytona, e2b, modal ect. but decided to roll my own with fly.io. ttl on agent create. mouse uses per-thread sandboxes (not shared-container multi-workspace) and then post-gres for agent history ect.

i'll have to look more at islo, I definitely think its a growing space with alot of opportunity for those that participate and solve problems.

ryanlitalien · 2026-05-04T20:32:25 1777926745

Great ideas, I'll look at those too. We're only a few steps away from building our own cloud providers :D.

adobrawy · 2026-05-04T05:02:21 1777870941

I'm a Claude Code Web fan and a rather heavy user. So I was interested in your product. However, I couldn't find an answer on the website. What parts did you find so good that you ported them?

Aeroi · 2026-05-04T10:59:45 1777892385

Nothing groundbreaking but i'll do a blog writeup on the architecture if it would be helpful for people. My focus has been on mobile.

The main pieces I've integrated for mouse.dev inspired by claude/cursor was plan mode, agent questions, subagents, pre/post hooks, context compaction, repo-local skills, and permission modes. So mostly tools like enter_plan_mode, ask_user_question, and spawn_subagent, plus .mouse/skills and .mouse/plans.

One nice feature is continuity. If you’re working on desktop and save a plan to .mouse/plans, you can pick it up later on mobile with cloud agents, or do the reverse. You can plan something from your phone, then when you’re back at your desk, review it/build it. That was my initial goal with this project because I've found the plan act loop so helpful.

Mouse Cloud Agents is mostly an OpenCode-based harness, but everything routes through our MCP/event system so it’s mobile-first and provider-agnostic.

I intentionally skipped a lot of IDE and Claude Code style desktop features. The bet is that this new style of coding is becoming less “edit files in an IDE” and more steer a capable coding chatbot.

Would love to hear from anyone reading that's iterating on harness architecture, it's been really fun to work on.

Aeroi · 2026-05-03T18:51:00 1777834260

is this the biggest missed pivot of all time?