Hacker Newsnew | past | comments | ask | show | jobs | submit | krackers's commentslogin

>background computer use

How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.

I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.


I remember looking trying to build something like this 6 years ago[0]. There are some interesting APIs for injecting click/keystroke events directly into Cocoa, and other APIs for reading framebuffers for apps that aren't in the foreground.

In particular there was some prior art that I found for doing it from the OpenQwaQ project, which was a GPLv2 3D virtual world project in Squeak/Smalltalk started by Alan Kay[1] back in 2011.

If I recall correctly, it worked well for native apps, but didn't work well for Chromium/Electron apps because they would use an API for grabbing the global mouse position rather than reading coordinates from events.

[0]: https://github.com/antimatter15/microtask/blob/master/cocoa/... [1]: https://github.com/OpenFora/openqwaq/blob/189d6b0da1fb136118...


Probably accessibility APIs

Which specific ones though allow you to send input to a window without raising it? People have been trying to do "focus follows mouse [without auto raise]" for a long time on mac, and the synthetic event equivalent to command+click is the only discovered method I'm aware of, e.g. used in https://github.com/sbmpost/AutoRaise

There is also this old blog post by Yegge [1] which mentions `AXUIElementPostKeyboardEvent` but there were plenty of bugs with that, and I haven't seen anyone else build on it. I guess the modern equivalent is `CGEventPostToPSN`/`CGEventPostToPid`. I guess it's a good candidate though, perhaps the Sky team they acquired knows the right private APIs to use to get this working.

Edit: The thread at [2] also has some interesting tidbits, such as Automator.app having "Watch Me Do" which can also do this, and a CLI tool that claims to use the CGEventPostToPid API [3]. Maybe there's more ways to do it than I realized.

[1] https://steve-yegge.blogspot.com/2008/04/settling-osx-focus-... [2] https://www.macscripter.net/t/keystroke-to-background-app-as... [3] https://github.com/socsieng/sendkeys


Citrix

/s


>See what the federal government spent with your tax dollars.

Is thinking of it in this sense actually accurate? I always assumed since every government has embraced MMT they can spend whatever they want simply by printing it out of thin air. Then taxation could be understood as the only crude knob to "destroy money", and also has the effect of forcing USD to be the primary national currency (e.g. owning bitcoin won't do you any good if you ultimately need to pay taxes in USD).


> since every government has embraced MMT

This is doing a lot of imaginary heavy lifting.


If good writing was easy then "LLM slop writing" wouldn't be a thing.

Not at all. LLM slop exists exactly because writing is easy, but figuring out what to write is hard.

> just cancelled IntelliJ for a thousand engineers

IntelliJ can't cost more than the AI provider subscriptions, and it will actually handle large refactors without breaking your codebase.


But if you take away their IDEs, they’ll be forced to use the AI! What could possibly go wrong with this plan?

>just people seem more willing to take the trap door ideas

It's mainly due to pressure from above. People who want to do a good job and are allowed the time to will be fastidious with or without AI. But now AI provides a shortcut and band-aid where things can be papered over or products can be launched quickly. Ship fast and then iterate" doesn't work when you're building on shaky foundations, but good luck convincing people of that.


I think you probably meant this, but when used with RL it's usually KL(π || π_ref), which has high loss when the in-training policy π produces output that's unlikely in the reference. But yeah as you noted, I guess this also means that there is no penalty if π _does not_ produce output in π_ref, which leads to a form of mode-collapse.

This collapse in variety matches with what I've seen some studies show that "sloppification" is not present in the base model, and is only introduced during the RL phase.


>MCP exposes capabilities and Skills may shape how capabilities are used.

This is my understanding as well. What most people seem to ultimately be debating is "dedicated tool calls" (which is what MCP boils down to) versus a stateful environment that admits a single uber-tool (bash) that can compose things via scripting.

I guess this is what riles people up, like emacs vs vim. Some people see perfectly good CLI tools lying around and don't see why they need to basically reimplement a client against API. Others closer to the API provider side imagine it cleaner to expose a tailored slim-down surface. Devs that just use claude code on a laptop think anything other than CLI orchestration is overcomplicating it, while others on the enterprise side need a more fine-grain permission model and don't want to spin up an entire sandbox env just to run bash.

It's also not either or. You can can "compose" regular tool calls as well, even without something as heavy weight as an entire linux env. For instance you could have all tools exposed as FFI in QuickJS or something. The agent can invoke and compose tools by writing and executing JS programs. How well this works depends on the post-training of the model though, if agents are RL'd to emit individual tool calls via

    <tool>{"myTool": {"arg1": 1}}</tool>
    <tool>{"myTool": {"arg1": 2}}</tool>
tokens, then they're probably not going to be as successful shoving entire JS scripts in there like

   <tool>
      const resp1 = myTool(1);
      const resp2 = myTool(2);
      console.log(resp1, resp2);
   </tool>

See this video, beautiful explanation that doesn't already assume familiarity with entropy https://www.youtube.com/watch?v=ErfnhcEV1O8

There's a more thorough version of this at https://www.youtube.com/watch?v=RSaJAAqSAMw and the end-result doesn't look as tacky

if you want to do this, there is a better technique than shown in this video.

get a single-cut fine file, maybe with a little more weight than the one in the video. single cut file has diagonal slots and allows firm and continuous contact with the piece. most files are double cut, have two sets of slots and look like bumpy diamonds. they remove more material but tend to bounce.

use long even strokes with firm pressure, only during the fore stroke. watch out for roll-off, where you unconsciously change the angle or pressure of the file as you're at the end of the stroke.

you can make a pretty even-looking chamfer that way.


I've been thinking of just using sandpaper stuck to a block of wood, though I imagine that might be slower.

Heck, a little part of me is tempted to try the smallest radius round-over router bit I have in a trim router, but the odds of that going horribly wrong are just way too high.


Or get a Dremel.

There's absolutely no way you get a good result with a Dremel.

How is this different from a standard tool-call agentic loop, or subagents?

Each stack frame has its own isolated context. This pushes the token pressure down the stack. The top level conversation can go on for days in this arrangement. There is no need for summarization or other tricks.

Is this related to the paper on Recursive Language Models? I remember it mentioned something similar about "symbolic recursion", but the way you describe it makes it sound too simple, why is there an entire paper about it?

The RLM paper did inspire me to try it. This is where the term comes from. "Symbolic" should be taken to mean "deterministic" or "out of band" in this context. A lot of other recursive LLM schemes rely on the recursion being in the token stream (i.e.. "make believe you have a call stack and work through this problem recursively"). Clearly this pales in comparison to actual recursion with a real stack.

This is just subagents.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: