If you ever spend much time trying to _build_ a competent hiring pipeline (and every dev should it’s an eye opening experience) you come to realize that it’s very hard to evaluate the process itself.
For instance I found, the same questions, from a script, asked by different evaluators would regularly perform differently. But getting statistical relevance is hard!
So that’s the allure of leetcode. You can get a large population with standardization, relatively cheaply. That it’s actually a bad eval method gets lost in the wash which is unfortunate but I certainly understand it.
Conversely, “talk about your project” was a completely useless eval when I tried to use it. Good candidates failed, bad candidates passed, evaluators had all manner of biases to the point I started being suspicious that _time of day_ mattered more than the answer.
I’d 100% buy that an individual can accurately judge candidates with this approach, but I’d want heavy evidence if you claimed you could scale it.
For instance I found, the same questions, from a script, asked by different evaluators would regularly perform differently. But getting statistical relevance is hard!
So that’s the allure of leetcode. You can get a large population with standardization, relatively cheaply. That it’s actually a bad eval method gets lost in the wash which is unfortunate but I certainly understand it.
Conversely, “talk about your project” was a completely useless eval when I tried to use it. Good candidates failed, bad candidates passed, evaluators had all manner of biases to the point I started being suspicious that _time of day_ mattered more than the answer.
I’d 100% buy that an individual can accurately judge candidates with this approach, but I’d want heavy evidence if you claimed you could scale it.