Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Good DS thinks from first principles. Bad DS accepts everything they have heard or seen as the ground truth, or the best way to do something.

Domain knowledge - and the humble attitude that can get stakeholders to give it to you is fundamental to understanding data and how models will be interpreted and used. There is not enough "listen to others" in this list (although I read the "listen to customers" at the end). Listening... listening listen!



This reminds me about a time when some geneticists tried to find genes associated with a particular disease, to try to unravel why it occurs. Complex trait, no single answer, so they genotyped thousands of people with and without the disease, and ran the stats. And... nothing.

What has one common name is actually several similar diseases, and the geneticists would have known that if they paid attention to the clinicians. Listening and incorporating knowledge is key.

[I'm thinking of an early glaucoma GWAS, IIRC, though there are similar cases.]


I think this story is very, very common. Still, some complex diseases (eg. Cystic fibrosis, Down syndrome) do turn out to be simple on a genetic level, so there is some merit to this approach.

Moreover, there is currently no better way to understand diseases genotyping thousands of people with and without the disease and 'running the stats', so it's worth the try


I would say that a good data scientist can quickly estimate where their time is best spent, either accepting what someone else has told them as-is or investigating themselves from the ground up. There's always more to investigate so using your time efficiently is one of the most important DS skills. Like solving a multi-armed bandit problem.


Sounds like something that is a function of your domain knowledge and your data science skills will have very little to do with it


The knowledge that informs where to spend your time can be based on domain knowledge (and also experience of working with data in general), but the framework for estimating probabilities that investing your time will get you worthwhile results and acting on those probabilities has more to do with statistics.


Oh man! Domain knowledge is absolutely HUGE. I cannot even begin to tell you how much I've had to dive into literature on topics well outside of my domain to begin to understand how to use my outside perspective to come up with solutions.

Respecting stakeholders, and being able to be humble about asking for help understanding the domain is paramount.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: