> Isn’t one agent fast enough? Why lose accuracy over +- one week to write a compiler?
My thinking as well, IMO it is because you need to wait for results for longer. You basically want to shorten the loops to improve the system. It hints at a problem that most of what we see is a challenge to seed a good context for it to successfully do something in many iterations.
Very much the same experience. But it does not talk much about the project setup and the influence of it on the session success. In the narrow scoped projects it works really well, especially when tests are easy to execute. I found that this approach melts down when facing enterprise software with large repositories and unconventional layouts. Then you need to do a bunch of context management upfront, and verbose instructions for evaluations. But we know what it needs is a refactor thats all.
And the post touches on a next type of a problem, how to plan far ahead of time to utilise agents when you are away. It is a difficult problem but IMO we’re going in a direction of having some sort of shared “templated plans”/workflows and budgeted/throttled task execution to achieve that. It is like you want to give a little world to explore so that it does not stop early, like a little game to play, then you come back in the morning and check how far it went.
Buy local is a well known and used tactic globally in many places big and small. Another observation, saying it is nationalistic is odd given it involves multiple nationalities. US has protectionist policy EU has it, there is nothing new here. The odd thing is that it triggers the person for it being so small.
It is good to have a dedicated location to find these. The problem is that you want a sufficiently large company when buying the services so that it does not fall apart or get acquired and runs to the ground, and we have a few. Also, putting a country flag to the service is cringe, it might even be odd to some because it implies a specific language/culture. We just all want to consume a proper business staffed with pros and the one which does not resell AWS services.
Did a bit of soul searching and manually optimised to 1087 but I give up. What is the number we are chasing here? IMO I would not join a company giving such a vague problem because you can feel really bad afterwards, especially if this does not open a door to the next stage of the interview. As an alternative we could all instead focus on a real kernel and improve it :)
Author of the take-home here: That's quite a good cycle count, substantially better than Claude's, you should email it to performance-recruiting@anthropic.com.
IMO the idea of providing more in OSS usually stems from various third parties who use that code in production but do not really contribute back to it. The only sensible thing the person publishing code online needs to do is to protect their copyright and add a license. This weird idea that somehow you become responsible for the code to the point that you need to patch every vulnerability and bug, and now identify the use of AI is wrong on so many levels. For the record I’ve been publishing OSS for years.
Personally for me, who is bought into the Apple ecosystem this is worrying. I am aware how PCC is supposed to work (which is the likely target platform) but the deal with Google of all the companies sends bad signal to consumers who are privacy focussed. If such a feature will be baked in without a way to switch it off, the next device will not be iphone or macbook or ipad.
You can already run quantized models without much friction, people also have dedicated apps for that. It changes very little for people because they everyone who wanted to do it already solved it and those who do not they dont care. It is marginal gain from consumer, a feature to brag about for apple, big gain for google. Users also would need to change existing habits which is undoubtedly hard to do.
When my wife picked up her replacement passport, they had a scanning machine in the waiting room so you could check the NFC details before leaving the premises. (it also happily told me what my blue one reported over NFC)
Depending on Google’s explicit product to build a startup is crazy. There is a risk of them changing APIs or offerings or features without the ability to actually complain, they are not a great B2B company.
I hope you just use the API and can switch easily to any other provider.
My thinking as well, IMO it is because you need to wait for results for longer. You basically want to shorten the loops to improve the system. It hints at a problem that most of what we see is a challenge to seed a good context for it to successfully do something in many iterations.
reply