Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A data scientist is someone that people wish was a unicorn but that is neither that nor a scientist, despite the name.

People who are _actual_ scientists usually in industry go by the name "scientist" or "research scientist", although they just data just as much. You can recognize them by the peer reviewed scientific papers they publish, often preceded by filed patent applications, as their work is novel. A real scientist wonders why some people call themselves "data" scientists, because science has always been about data, modeling and measurement.

But back to our "data scientist":

On a good day, she is generating value from the company's data to increase customer retention.

On a bad day, she is just doing the ETL prep work so the boss' other assistant can make that spreadsheet that aggregates the data that the boss' PPT slides will show.



This sentiment is quite popular among those who would like to have the same popularity that data scientists currently (well, more a few years ago, since there are many more critical voices now) have, but they don't.

Data science is a generic name. There are DS like me who have been "actual scientists" and others who until yesterday were working on dashboards and Excels files with 100 tabs open and pivot tables as far as the eye can see. Whatever, it is a name. What about "engineers"? It is a title with no legal value, people in the US can call themselves software engineers, but in many other countries, they could not. And who is a writer? Somebody making a living out of writing, somebody who has been published even if they got zero money for it and the magazine editor was their cousin, or else?

People in my team do causal modeling, use reinforcement learning for network configuration, NLP for chatboxes, computer vision for face ID, and (again) network configuration. They are all called data scientists. Thinking that what people who have the title "Data Scientist" do is "generating value via increased consumer retention" or "ETL for Excel files for the boss" is between misinformed and laughable, but mostly laughable. The world is much bigger than that.

Then, I agree that "learning from data" as a specialty has been over-hyped, and most companies do not have the maturity to take advantage of ML prediction, causal and statistical modeling, etc., but that's the nature of the world: one can take advantage of it or being bitter about it. I took advantage of the hype and I am fine, happy, and with no regrets. If tomorrow someone would propose to use for the same job the title "Data Monk" and it paid more, were more visible, and led to more career opportunities, I would grab it as quickly as I would grab 100 dollars floating in and out of the sidewalk.


On the contrary, many of us were amused at the birth of the term "data science". With "political science" and "computer science" as examples, we felt that including the word "science" in a field name was a bit "The lady doth protest too much, methinks". Those who named "data science" don't share our sentiment.

This was a deliberate effort to Balkanize statistics, form a new union, a franchise reboot. Had statistics instead been called "statistical science" you can be sure that "data science" would have chosen a different term.

Words face in a direction. While many roll their eyes at a Ph.D. going by doctor (it shouts insecurity, a poker tell for a second-rate institution), one also needs to understand the experience of a young female attempting to command a classroom's respect. The Brits love titles and hierarchy; at some level this is just a fashion choice.

Similarly, no one serious about food calls themselves a "gourmet", but the term has commercial value.

"Data science" isn't named for us. It's named for the clients.


"Statistics" already translates to "science of state", with the word for "science" omitted and implied. Some languages favor deriving new pompous terms from Latin roots (like "informatics"), while English favors blunt descriptive terms (like "computer science"). The original English term for statistics was "political arithmetic", but ultimately the continental term prevailed.


Many (most) scientists are also not everything they’re made out to be. Medicine, for example, has had a real replication crisis. It’s important to distinguish between Science and scientists. Finally ...if you’re running regressions, it’s better to get paid 300K than 130K.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: