Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>Data science effectively rebranded statistics but removed the requirement of deep statistical knowledge

An important thing people miss is that shallow statistical knowledge can cause subtle failures, but shallow software engineering knowledge can cause subtle failures too.

A junior frontend developer will write buggy code, notice that the UI is glitched, and fix the bug. A junior data analyst will write buggy code, fix any bugs which cause the results to be obviously way off, but bugs which cause subtler problems will go unfixed.

Writing correct code without the benefit of knowing when there is a bug is challenging enough for senior developers. I don't trust newbie devs to do it at all.

Context here is I used to work in email marketing and at one point I was reading some SQL that one of the data scientists wrote and observed that it was triple-counting our conversions from marketing email. Triple-counting conversions means the numbers were way off, but not so far off as to be utterly absurd. If I hadn't happened to do a careful read of that code, we would've just kept believing that our email marketing was 3x as effective as it actually was.

So, it's impossible to know how much of a problem this is. But there is every reason to believe it is a significant problem, and lots of code written by data scientists is plagued by bugs which undermine the analysis. (When's the last time you wrote a program which ran correctly on the first try?) Any serious data science effort would enforce stern practices around code review, assertions, TDD, etc. to make the analysis as correct as possible -- but my impression is it is much more common for data analysis to be low-quality throwaway code.



This is an important point. I used to work in adtech. It's amazing how terrible the modeling is in that space. You can generate a model that identifies a given target audience and simply assert that it works without any real validation.


Surely adtech companies like Google and FB do OK though?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: