What about hybrid automations or human-in-the-loop flows? We have automations where the human starts by logging in, then hands over to the agent. Some parts may even be Puppeteer automated. This also means the session may be long running, typically for months at a time and the agent needs to notify the human again if they get logged out. None of the existing browser automation platforms I have tried make this easy or cost effective, so we are currently trying to build our own. Would love to consider Simplex if this is solved.
Could I ask why the flow starts with a human logging in? Is it because you're using their credentials and/or have some sensitivity around storing their credentials? Or is it something to do with 2FA (we handle 2FA)? Or are you just storing the session data after they log in so you can re-use it for those few months you mentioned?
Re: Puppeteer automation as part of the script -- we have a feature we wrote for one of our customers that we didn't promote to production where you can define a deterministic action in the dashboard that allows you to paste in JavaScript, but we're likely not to push that to prod anytime soon. Could you explain your reasoning for wanting to use Puppeteer still? We've generally seen customers fully switch over to Simplex instead of relying on their original Puppeteer/Playwright scripts -- since we have action caching, the underlying script (click on div locator with this div id, etc.) is pretty similar to what you'd get using Playwright.
Security conscious domain. We do automations on behalf of our clients and they don't want credentials stored. "Handling" 2FA automatically is completely unacceptable, it breaks the entire point of the 2FA security model. Besides, login sometimes involves out-of-band 2FA methods including phone number.
And not just any video data, they specifically mentioned screen recordings for agentic computer uses. A very specific kind of video. My guess is they have a partnership with someone like Rewind.ai
I wonder if this would enable the truly "serverless" application I've been thinking about. Imagine shipping a whole Rails/Laravel/Wordpress app to the user to be run in their browser with sqlite. Technically you would only need a CDN to distribute the app.
It's not clear to me what a "container" and "pairing" is in this context. What if my application is not dockerized? Can Claude Code execute tests by itself in the context of the container when not paired? This requires all the dependencies, database, etc. - do they all share the same database? Running full containerized applications with many versions of Postgres at the same time sounds very heavy for a dev laptop. But if you don't isolate the database across parallel agents that means you have to worry about database conflicts, which sounds nasty.
In general I'm not even sure if the extra cognitive overload of agent multiplexing would save me time in the long run. I think I still prefer to work on one task at a time for the sake of quality and thoroughness.
However the feature I was most looking forward to is a mobile integration to check the agent status while away from keyboard, from my phone.
Then claude runs in a container created from our default image, and any code it executes will run in that container as well.
> Can Claude Code execute tests by itself in the context of the container when not paired?
Yup! It can do whatever you tell it. The "pairing" is purely optional -- it's just there in case you want to directly edit the agent's code from your IDE.
> Do they all share the same database?
We support custom docker containers, so you should be able to configure it however you want (eg, to have separate databases, or to share a database, depending on what you want)
> Running full containerized applications with many versions of Postgres at the same time sounds very heavy for a dev laptop
Yeah -- it's not quite as bad if you run a single containerized Postgres and they each connect to a different database within that instance, but it's still a good point.
One of the features on our roadmap (that I'm very excited about) is the ability to use fully remote containers (which definitely gets rid of this "heaviness", though it can get a bit expensive if you're not careful)
> the feature I was most looking forward to is a mobile integration to check the agent status while away from keyboard, from my phone.
in this context, the container contains the running claude instance, and pairing synchronizes its worktree with your local worktree.
under sculptor, claude code CAN execute tests by itself when not paired. that will also work for non-dockerized applications.
sharing a postgres across containers may require a bit of manual tweaking, but we support the devcontainer spec, so if you can configure e.g. your network appropriately that way, you can use a shared database as you like!
regarding multiplexing: the cognitive overhead is real. we are investigating mechanisms for reducing it. more on that later.
regarding mobile integration: we also want that! more on that later.
People who want the Good Stuff in life acquire taste and expertise or rely on the opinion of trusted people who have taste and expertise. It's always been like this. Otherwise acquiring the Good Stuff is a matter of random chance.
This is not just for LLM code. This is for any code that is written by anyone except yourself. A new engineer at Google, for example, cannot hit the ground running and make significant changes to the Google algorithm without months of "comprehension debt" to pay off.
However, code that is well-designed by humans tends to be easier to understand than LLM spaghetti.
>However, code that is well-designed by humans tends to be easier to understand than LLM spaghetti.
Additionally you may have institutional knowledge accessible. I can ask a human and they can explain what they did. I can ask an LLM, too and they will give me a plausible-sounding explanation of what they did.
I can't speak for others, but if you ask me about code I wrote >6 months ago, you'll also be stuck with a plausible-sounding explanation. I'll have a better answer than the LLM, but it will be because I am better at generating plausible-sounding explanations for my behavior, not because I can remember my thought processes for months.
This is where stuff like git history often comes in handy. I cannot always reliably explain why some code was the way it is when looking at a single diff of my own from years ago, but give me the history of that file and the issue tracker where I can look up references from commits and see the comments etc, and I can reconstruct it with very high degree of certainty.
There might also be a high level design page about the feature, or jira tickets you can find through git commit messages, or an architectural decision record that this new engineer could look over even if you forgot. The LLM doesn't have that
The weights won't have that by default, true, that's not how they were built.
But if you're a developer and can program things, there is nothing stopping you from letting LLMs have access to those details, if you feel like that's missing.
I guess that's why they call LLMs "programmable weights", you can definitely add a bunch of context to the context so they can use it when needed.
>But for asking a clarifying question during a training class?
LLMs can barely do 2+2, humans don't even understand the weights if they see them. LLMs can have all the access they want to their own weights and they won't be able to explain their thinking.
If we want to see what the actual impact on Apple’s market share would be if they pulled out of the EU market, consider Russia as a case study. Since Apple pulled out of Russia in 2022, the market share of iOS devices has remained steady and even slightly increased. Russia is comparatively poor, Apple would lose nothing if they pulled out of the EU because consumers would simply travel to other countries to buy.
Here’s a comparison of iOS share in Russia (mobile OS) between August 2025 vs August 2021, using StatCounter data:
• August 2025: iOS ~ 31.97 %
• August 2021: In 2021, the iOS share in Russia was about 27.52 % (for mobile OS) per StatCounter’s data.
The unstable driver issue exactly why Apple should not make devices interoperable as the EU demands. If they comply with the EU demand and then the device performance is bad due to crappy third party integrations, consumers will blame Apple. If consumers can buy Apple devices everywhere except the EU they will blame the EU not Apple.
Few die hards? We have an actual case study: Russia. Apple does not sell in Russia anymore but market share has increased.
Here’s a comparison of iOS share in Russia (mobile OS) between August 2025 vs August 2023, using StatCounter data:
• August 2025: iOS ~ 31.97 %
• August 2023: iOS ~ 27 % (approximately) — StatCounter’s historical data shows iOS had around 26-28 % share in Russia in mid-2023
So between August 2023 and August 2025, iOS’s share in Russia appears to have increased by about 4–6 percentage points (from ~27% → ~32%).
As you see Apple is willing to chase money despite sanctions. If they would be serious, there would be 0% iOS devices in Russia, because we both know that iPhones can be remotely locked.
Now compare tiny Russian market to European market. Apple is making obvious empty bluff.
The Russian market is not tiny, it is literally the largest economy in Europe with the highest GDP per capita for 2025 when adjusted for PPP. It is also has the highest population in Europe.
If they can pull out of Russia and lose nothing why can’t they pull out of the EU?
Despite Apple’s exit from Russia after the 2022 invasion, StatCounter data shows iOS share actually rose from ~27.5% in Aug 2021 to ~32% in 2025 in Russia.
Russian market has been less than 1% of Apple's revenue
European market is 25% Apple's revenue (7% is just App Store revenue)
Russian market is tiny compared to Europe + they were forced to do it via sanctions or risk being sanctioned themselves, so it is easy for Apple to tell we have to leave this market (even that they really did not, otherwise Russians would not be able to i.e. pay for development licenses). Leaving Europe is entirely Apple's decision without somebody forcing their hand.
> But EU consumers will be very angry at Brussels.
Why would they be angry at Brussels? If Apple decides to pull out of the EU market that's on them. EU citizens will be a lot less angry at Brussels than Apple's shareholders will be.
EU consumers will travel to other countries or buy through second hand markets, which is what happens in Russia now. Apple will lose little revenue. Apple can easily spin this as a win to shareholders showing how much competitive advantage was preserved by not having to build the EUs interoperability demands.
In the end the EU consumer gains nothing from all of this but loses their beloved Apple devices.
Well, then Apple should do it, tomorrow. For sure it won't even register on their annual report then.
> In the end the EU consumer gains nothing from all of this but loses their beloved Apple devices.
I don't think I have an emotional connection with my phone, it serves to call people, message them and to perform other useful functions and if it does not it gets replaced. I'm happy with it, it's a solid piece of gear and it has served me well. But emotional connections with brands or pieces of easily replaceable hardware are unhealthy.
Can you have an "anti-"emotional connection to a brand? The iPhone for me is missing a critical feature, which is ability to run the software I choose even if it didn't come from the App Store. Which means that brand is dead to me until that situation changes.
Not particularly happy with Google for other reasons either. There are some days I want to go back to the days of Windows Mobile ROM kitchens and PalmOS. At least it wasn't such a monoculture back then.
Yes, that's a problem, but this is akin to all of the other ways in which things are no longer properly sold but come with all kinds of strings attached. My computers are mine, and I determine what is being run on them. I realize that puts me in a - small - minority but I prefer to own things than to rent them. I don't want an ongoing relationship with vendors beyond the initial transaction and possibly warranty issues.
This informs a lot of my choices. It's the reason my car is old, it's the reason my computer is running Linux, it is the reason why I don't wear branded apparel and it helped me decide where to bank. But I fear that it is a losing battle.
The monoculture that you refer to creates choke points and legislators love those. It gives an illusion of control, but actually it is just a massive security risk.
Press releases are for the public, they are essentially holding their own users hostage, and users that don't see this for what it is deserve all the misery they will get. But you're not going to be able to pretend that Apple isn't solely responsible for following the law in those places where they want to make money. If they don't want to comply with the law that is on them, not on the legislators. And it's high time that these companies learned they are not larger than the nation states where they operate, no matter how rich they are.
> obviously they’re going to try and talk it out first with the EU
Will they? They were never willing to talk it out to begin with. And now they completely walked out:
--- start quote ---
“Apple has simply contested every little bit of the DMA [Digital Markets Act] since its entry into application,” said Commission spokesperson Thomas Regnier. “This undermines the company’s narrative of wanting to be fully cooperative with the Commission.”
...
“Results of this positive engagement? After two months, Apple came back and asked us to scrap everything,” he said
It's a game of leverage. Apple is stoking the fire to make consumers angry hoping the EU to pull back. EU is pushing forward hoping that there will not be enough popular opinion to sway its decision. In this case, I think EU has far bigger leverage.