Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Left and right are considered opposites, but semantically they’re extremely similar. They both refer to directions that are relative to some particular point and orientation. Compared to, say, the meaning of “backpack,” their meanings are nearly identical. And in the training data, “A right X” and “B right Y” will tend to have very similar As and Bs, and Xs and Ys. No surprise LLMs struggle.

I imagine this is also why it’s so hard to get an LLM to not do something by specifically telling it not to do that thing. “X” and “not X” are very similar.



The image encodings often don’t have positional information in them very well.


A lot of pictures on the web are flipped horizontally bc. of cameras, mirrors, you name it. It's usually trivial for humans to infer what are the directions involved, I wonder if LLMs could do it as well.


Recently I scanned thousands of family photos, but I didn't have a good way to get them oriented correctly before scanning. I figured I could "fix it in post" .

If you upload an incorrectly oriented image to google photos, it will automatically figure that out and suggest the right way up (no EXIF data). So I set about trying to find an open-source way to do that since I'm self-hosting the family photos server.

So far, I haven't managed it. I found a project doing it using pytorch or something, but it didn't work well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: