Fair enough, but my bet is that even trivially recontextualizing an interview qu...

Fair enough, but my bet is that even trivially recontextualizing an interview question is enough to wreck an LLM's performance.

If you change the "find a word in a grid of letters" question to a "find a Collatz sequence in a grid of numbers" question, does it still work? As an interviewer, I would expect a qualified candidate to spend maybe an extra 5-10 minutes asking clarifying questions and understanding the difference between the two.