Fair enough, but my bet is that even trivially recontextualizing an interview question is enough to wreck an LLM's performance.
If you change the "find a word in a grid of letters" question to a "find a Collatz sequence in a grid of numbers" question, does it still work? As an interviewer, I would expect a qualified candidate to spend maybe an extra 5-10 minutes asking clarifying questions and understanding the difference between the two.
If you change the "find a word in a grid of letters" question to a "find a Collatz sequence in a grid of numbers" question, does it still work? As an interviewer, I would expect a qualified candidate to spend maybe an extra 5-10 minutes asking clarifying questions and understanding the difference between the two.