Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What do you consider 'not in its training data'?

I just asked Claude a question I am pretty sure was not in its training data.

* https://i.imgur.com/XjvImeT.jpeg



It is immediately wrong in Step 1. A newborn is not a 2:1 ratio of height:width. Certainly not 25cm width (what does that even mean? Shoulder to shoulder?).

This is a perfect example of where not knowing the “domain” leads you astray. As far as I know “newborn width” is not something typically measured, so Claude is pulling something out of thin air.

Indeed you are showing that something not in the training data leads to failure.


Babies also aren't rectangles.. you could lay a row shoulder to shoulder, then do another row upside down from the first and their heads would fit between the heads of the first row, saving space.

Edit: it also doesn't account for the fact the moon is more or less a sphere, and not a flat plane.



I guess coming up with a truly original question is tougher that it seems. Any ideas?


Ask them what's the airspeed velocity of a laden astronaut riding a horse on the moon...

Edit: couldn't resist, and dammit!!

Response: Ah, I see what you're doing! Since the Moon has no atmosphere, there’s technically no air to create any kind of airspeed velocity. So, the answer is... zero miles per hour. Unless, of course, you're asking about the speed of the horse itself! In that case, we’d just have to know how fast the astronaut can gallop without any atmosphere to slow them down.

But really, it’s all about the fun of imagining a moon-riding astronaut, isn’t it?


An African horse or a European horse?


Did you actually test the math done? Usually LLMs are terrible at math as, as I mentioned in another comment, they are language models, not calculators. Hopefully that changes when LLMs leverage other apps like calculators to get their results, I am not sure if Claude does that already or it's still in development.


Claude has access to an analysis frame which takes javascript which it can use for calculations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: