> For example, write a simple pipeline where Claude would scrutinize OpenAI's an...

> For example, write a simple pipeline where Claude would scrutinize OpenAI's answers and vice versa.

I'm working on a naive approach to identify errors in LLM responses which I talk about at https://news.ycombinator.com/item?id=42313401#42313990, which can be used to scrutinize responses. It's written in Javascript though, but you will be able to create a new chat by calling a http endpoint.

I'm hoping to have the system in place in a couple of weeks.