Hacker Newsnew | past | comments | ask | show | jobs | submit | atarus's commentslogin

We track the failure modes in production directly instead of relying on simulation. So if suddenly we are seeing a failure mode pop up too often, we can alert timely. In the approach of going from simulation to monitoring, I am worried the feedback might be delayed.

Doing it in production also helps to go run simulations by replaying those production conversations ensuring you are handling regression.


This comes from our architecture. Since we are aware of the agent's context our test agents know the incomplete flows and the assertions are per session.

If we miss some cases, there's always a feedback loop to help improve your test suite


Yes, we already support knowledge base integrations for BigQuery and plan to expand the set of connectors. You can always drop knowledge files currently.

Moreover, we even generate scenarios from the knowledge base


Training is an overkill at this point imo. I have seen agents work quite well with a feedback loop, some tools and prompt optimisation. Are you doing fine-tuning on the models when you say training?


Nope - just use memory layer with model routing system.

https://github.com/rush86999/atom/blob/main/docs/EPISODIC_ME...


Memory is usually slow and haven't seen many voice agents atleast leverage it. Are you building in text modality or audio as well?


That's actually interesting. Is it a dependancy on user to create the HTTP endpoints for the /speak and /transcript?

One of our learnings has been to allow plugging into existing frameworks easily. Example - livekit, pipecat etc.

Happy to talk if you can reach out to me on linkedin - https://www.linkedin.com/in/tarush-agarwal/


Just sent an connection invitation on Linkedin. This is actually designed for allow e2e automation using playwright-mcp for a previous startup i worked in that does voice-based job interview agents. The http endpoints is provided by a daemom sitting on the background, listening all input to the virtual mic and transcribing and storing it. The agent can hit /speak and /transcript through an mcp. We have built Livekit Agents specific solutions by injecting text responses but felt that is not enough since we want to be able to test the whole thing end to end so I hacked a way to do virtual mic/speaker. It was designed for closing the dev-test-debug loop so that Claude Code can develop on its own rather than relying on human to test it.


To clarify you are using the "fast brain, slow brain" pattern? Maybe an example would help.

Broadly speaking, we see people experiment with this architecture a lot often with a great deal of success. A few other approaches would be an agent orchestrator architecture with an intent recognition agent which routes to different sub-agents.

Obviously there are endless cases possible in production and best approach is to build your evals using that data.


Yes, we do support integrations with different chat agent providers and also SMS/Whastap agents where you can just drop a number of the agent.

Let us know how your agent can be connected to and we can advise best on how to test it.


Looks great! So excited about this! We have been using gemma models since gemma 1.0 and they are so far ahead of the curve!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: