Iterating on LLM agents involves testing on production(-like) data. The most accurate way to see whether your agent is performing well is to see it working on production.
You want to see the best results you can get from a prompt, so you use features like prompt management an A/B testing to see what version of your prompt performs better (i.e. is fit to the model you are using) on production.
I have colleagues who are annoyed that I use Firefox because in their world everything Chrome does is standard and browsers like Safari and Firefox are annoying outliers. No matter if something they have implemented in Chrome is _actually_ standard and no matter how proper to the spec non-Chrome browsers implement a feature they see it as a chore to support the spec rather than the Chrome browser.
So, the "Why not use Chrome instead of Safari?” certainly happens.
Here is the neat part about Ruby, your autocomplete barely works and your IDE can only guess what you want, instead of relying on a good language service…
> Balatro was one of the biggest games of last year, and I'm sure the tinkerability was a big catalyst to that
Not sure I agree on that point. Balatro is a great game and the mainstream success is warranted, but my gut tells me that the technical implementation was not the catalyst for that. Sure, Lua’s portability could have led to the cross-platform popularity, but a mainstream gamer does not tinker with and mod Balatro at all.
What a joke of a company. They have the internet in the palm of their hands, and yet let vibe coding ambitions ruin their empire.
Time for everyone to drop this company and move on to better solutions (until those better solutions rot from the inside out, just like their predecessor did)
reply