How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.
I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.
I remember looking trying to build something like this 6 years ago[0]. There are some interesting APIs for injecting click/keystroke events directly into Cocoa, and other APIs for reading framebuffers for apps that aren't in the foreground.
In particular there was some prior art that I found for doing it from the OpenQwaQ project, which was a GPLv2 3D virtual world project in Squeak/Smalltalk started by Alan Kay[1] back in 2011.
If I recall correctly, it worked well for native apps, but didn't work well for Chromium/Electron apps because they would use an API for grabbing the global mouse position rather than reading coordinates from events.
Which specific ones though allow you to send input to a window without raising it? People have been trying to do "focus follows mouse [without auto raise]" for a long time on mac, and the synthetic event equivalent to command+click is the only discovered method I'm aware of, e.g. used in https://github.com/sbmpost/AutoRaise
There is also this old blog post by Yegge [1] which mentions `AXUIElementPostKeyboardEvent` but there were plenty of bugs with that, and I haven't seen anyone else build on it. I guess the modern equivalent is `CGEventPostToPSN`/`CGEventPostToPid`. I guess it's a good candidate though, perhaps the Sky team they acquired knows the right private APIs to use to get this working.
Edit: The thread at [2] also has some interesting tidbits, such as Automator.app having "Watch Me Do" which can also do this, and a CLI tool that claims to use the CGEventPostToPid API [3]. Maybe there's more ways to do it than I realized.
>See what the federal government spent with your tax dollars.
Is thinking of it in this sense actually accurate? I always assumed since every government has embraced MMT they can spend whatever they want simply by printing it out of thin air. Then taxation could be understood as the only crude knob to "destroy money", and also has the effect of forcing USD to be the primary national currency (e.g. owning bitcoin won't do you any good if you ultimately need to pay taxes in USD).
>just people seem more willing to take the trap door ideas
It's mainly due to pressure from above. People who want to do a good job and are allowed the time to will be fastidious with or without AI. But now AI provides a shortcut and band-aid where things can be papered over or products can be launched quickly. Ship fast and then iterate" doesn't work when you're building on shaky foundations, but good luck convincing people of that.
I think you probably meant this, but when used with RL it's usually KL(π || π_ref), which has high loss when the in-training policy π produces output that's unlikely in the reference. But yeah as you noted, I guess this also means that there is no penalty if π _does not_ produce output in π_ref, which leads to a form of mode-collapse.
This collapse in variety matches with what I've seen some studies show that "sloppification" is not present in the base model, and is only introduced during the RL phase.
>MCP exposes capabilities and Skills may shape how capabilities are used.
This is my understanding as well. What most people seem to ultimately be debating is "dedicated tool calls" (which is what MCP boils down to) versus a stateful environment that admits a single uber-tool (bash) that can compose things via scripting.
I guess this is what riles people up, like emacs vs vim. Some people see perfectly good CLI tools lying around and don't see why they need to basically reimplement a client against API. Others closer to the API provider side imagine it cleaner to expose a tailored slim-down surface. Devs that just use claude code on a laptop think anything other than CLI orchestration is overcomplicating it, while others on the enterprise side need a more fine-grain permission model and don't want to spin up an entire sandbox env just to run bash.
It's also not either or. You can can "compose" regular tool calls as well, even without something as heavy weight as an entire linux env. For instance you could have all tools exposed as FFI in QuickJS or something. The agent can invoke and compose tools by writing and executing JS programs. How well this works depends on the post-training of the model though, if agents are RL'd to emit individual tool calls via
if you want to do this, there is a better technique than shown in this video.
get a single-cut fine file, maybe with a little more weight than the one in the video. single cut file has diagonal slots and allows firm and continuous contact with the piece. most files are double cut, have two sets of slots and look like bumpy diamonds. they remove more material but tend to bounce.
use long even strokes with firm pressure, only during the fore stroke. watch out for roll-off, where you unconsciously change the angle or pressure of the file as you're at the end of the stroke.
you can make a pretty even-looking chamfer that way.
I've been thinking of just using sandpaper stuck to a block of wood, though I imagine that might be slower.
Heck, a little part of me is tempted to try the smallest radius round-over router bit I have in a trim router, but the odds of that going horribly wrong are just way too high.
Each stack frame has its own isolated context. This pushes the token pressure down the stack. The top level conversation can go on for days in this arrangement. There is no need for summarization or other tricks.
Is this related to the paper on Recursive Language Models? I remember it mentioned something similar about "symbolic recursion", but the way you describe it makes it sound too simple, why is there an entire paper about it?
The RLM paper did inspire me to try it. This is where the term comes from. "Symbolic" should be taken to mean "deterministic" or "out of band" in this context. A lot of other recursive LLM schemes rely on the recursion being in the token stream (i.e.. "make believe you have a call stack and work through this problem recursively"). Clearly this pales in comparison to actual recursion with a real stack.
How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.
I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.
reply