I have a dozen or so very random prompts I feed into every new model that are based on things I’m very knowledgeable and passionate about, and compare the outputs. A couple are directly coding related, a couple are “write a few paragraphs explaining <technical thing>”, and the rest are purely about non-computer hobbies, etc.
I’ve found it way more useful for me personally than any of the “formal” tests, as I don’t really care how it scores on random tests but instead very much do care how well it does my day to day things.
It’s like listening to someone in the media talk about a topic you’re passionate about, and you pick up on all the little bits and pieces that aren’t right. It’s a gut feel and very unscientific but it works.
> It’s like listening to someone in the media talk about a topic you’re passionate about, and you pick up on all the little bits and pieces that aren’t right. It’s a gut feel and very unscientific but it works.
I coined Murrai Gell-Mann for this sort of test of ai.
I’ve found it way more useful for me personally than any of the “formal” tests, as I don’t really care how it scores on random tests but instead very much do care how well it does my day to day things.
It’s like listening to someone in the media talk about a topic you’re passionate about, and you pick up on all the little bits and pieces that aren’t right. It’s a gut feel and very unscientific but it works.