This highlights one of my main complaints about the DS role. You are expected to have strong business intuition, sufficient coding skills to hold down a SWE role, a strong background in stats/math, know all the ML/DS specific skills and lastly, have technical depth in the subdomain you are looking to solve. All of this, while being paid the exact same as someone on the SWE or PM track.
No one can do it all. DSs that do 70% of these are the best of the best.
Mature DS groups have figured out that you have to pick your poison, and focus on archetypes rather than a 'well rounded' DS.
Here are a few DS archetypes that I've seen.
1. The NLP/Vision/RL domain expert: High depth, low breadth people. Not very concerned with business intuition. Strong grasp of math for their domain. Moderate coding abilities, but pipelining for their field is fairly well defined. What is SQL?
2. The Generalist : Comes close to the 'good data scientist' outlined here. Never publishes, solves DS problems, will probably struggle to reach principal IC level in any specific product group because they lack the prerequisite depth. Will often become a manager down the line though and can also become an excellent PM at some point. SQL is their life blood. The less business savvy people see them as MBA-adjacent. But, they are super important.
3. Mr Maths or the Statistician : Pairs excellently with #4
4. The MLE who doesn't want to be an MLE - Excellent coding skills. Sufficient ML/DS skills. Just hasn't found a way to get their foot in the door to transition to a DS role without taking a pay cut.
5. The Researcher : Hiring a researcher in the wrong team can lead to a completely ineffective team. Also, not having a researcher in a team that needs it can lead to everyone going around in circles.
Top DSs will manage to host a max of 2 archetypes in them. Trying to get your DS to host >2 archetypes, is a losing battle. This is as good as it is going get. Also, most teams don't need all archetypes.
Identify the archetypes you need. Get some coverage over them through your hired DSs and let them continue growing along their selected archetypes.
Learning all this is not really that difficult. No more difficult than a biochemist training in subjects as diverse as organic synthesis (making stuff in test tubes), Raman spectroscopy (prediction of chemical structures using vibrational signatures) and DNA sequencing (computational analysis).
It's only because data science is much newer than biochemistry as a field that it seems beyond the grasp of an individual. It's perfectly possible to learn (and to teach) all of the things you've mentioned.
And what has pay got to do with it? Since when is pay correlated to how much you need to study (see, for example, musicians)?
Data science is a role, not a field. It's similar to but wider than the applied statistician role that is well-established in many fields of research.
You have a background in one field, but you are working to solve problems in another field (e.g. biochemistry). To do that, you must understand biochemistry well enough to be able to contribute. You are probably far from the best biochemist in the team, as you were hired for your methodological skills. In order to solve the problems, you may need tools from a number of fields, including statistics, machine learning, software engineering, data engineering, mathematics, and theoretical computer science. No matter which field your original degree was in, it's insufficient in both depth and breadth. You must keep learning new things and rely on others with complementary skills.
I work in bioinformatics, which is basically a more established flavor of data science. I have worked with people from a variety of backgrounds from electrical engineering to genetics, and everyone has had obvious gaps in their skills. Except maybe one or two people, but they are world-famous experts who are unnaturally curious about everything.
Pay has a lot to do with it because if you can switch to an engineering role (SWE or Data Engineer) and have more focused responsibilities and a higher salary then that's what most of them will do.
Although given the demands made for a DS role are often unicorn-level I don't even think increasing pay would help.
The parent comment says ‘while being paid the exact same as someone on the SWE or PM track.’ Not ‘less than a SWE’, as you imply.
Why should a data scientist be paid more than a SWE? Because they have to learn several different topics? That is not such a big deal in my opinion (I work as a DS).
This language of ‘unicorns’ has been highly damaging to the field. There is nothing magical about a job which requires a lot of varied technical knowledge. Try looking at a syllabus for some other scientific subject. It’s fairly normal.
I work as a DS as well. I don't think there's such a thing as "should be paid more" - the market shows us that SWE's are more highly valued presumably because there is more demand for those skills.
However, this will lead to people migrating from DS to DE and SWE roles if the compensation is relatively better. Yet we see articles about a 'shortage' in DS when they just aren't paying as much as a similar skill-set can get in a different role.
> the market shows us that SWE's are more highly valued presumably because there is more demand for those skills.
I think it's that it's that a tech company can more consistently make money from a SWE than any other role. You can always roll together an app and sell it. For every other role[0], you provide value to the organization, which eventually makes its way to the customers.
This is why the software bootcamp grads have fared better than the DS bootcamps (and ML bootcamps). A company can get a lot of value from a pretty crummy SWE and is willing to pay for it. A crummy Data Scientist, not so much.
[0] Sales is also similarly direct, depending on the industry. They enjoy a similar status.
This is an unserious comment and the worst kind of gatekeeping, as it only applies to the shallowest definition of "learn" - perhaps "remember" and "understand" on Bloom's taxonomy.
Most biochemists, like most professionals in any field, are dilettantes in 99% of the field - it's the difference between reading a French cookbook and being Paul Valéry. They specialize and the rest of what they have learned rusts and sprouts weeds and is useless when viewed under the lens of applied knowledge.
And as the OP noted, your "all this" to learn includes (no offense to biochemists) probably the most multidisciplinary set of skills in any field: communication, business analysis, psychology, statistics, computer programming, hardware and network topology, data engineering, domain knowledge, often a deep background in one of the hard sciences ..
Of course it's not beyond any individual - Renaissance Men and Women do exist, but to suggest it's "not that difficult" is an uhelpful myth.
I'm a swe that moved from backend to a ds role and then as a ds manager at my company and this is spot on. If I advertise a job por a ds position I have to mix all these archetypes and get used to at best have a solid 4 that wants to pivot to ds as this is the archetype that knows that we are creating real life data products not just using the latest model or beating some metric.
> Top DSs will manage to host a max of 2 archetypes in them.
This ignores experience. Top DSs will manage to have maybe one archetype per some number of years on the job. You can find unicorns, but they all have many many years experience and you're going to have to pay for them.
But the "average" data scientist is probably more senior than the "average" software engineer: the right comparison to make is between the same person, at the same point in time, in the two roles.
Companies that hire a "data scientist", as opposed to getting help from temporary consultants or assigning their engineers and mathematicians to data science tasks, are probably companies that value data science highly (because there is a perceived necessity and/or business value) and do enough of it to hire a full time specialist.
Moreover, a company that starts a data science team is likely to hire someone sufficiently senior to work more or less alone, not the more junior data scientists that would need the guidance of such a team leader.
often (usually?) DS are paid less than SWEs of the same level!
I have plenty of cynical thoughts as to what drives that compensation gap. Maybe the simplest is just that there is high supply of people with these baseline skills and it isn't easy to distinguish if somebody is good or not.
I think there is just more demand for SWEs. Nearly every company will have software engineers, but not every company has data scientists and even the ones that do will almost certainly have more engineers than data scientists.
After all, you can't use data science to optimise your product or service if you don't have sufficient engineers to build it and maintain it in the first place.
As always, it's supply and demand. DS is often not needed as much as SWE and there is a lot of supply for DS, due to hype and ease of transition from people in other fields.
I'm struggling to understand what people think is so difficult about all this data science stuff. The maths is very basic, even in "advanced" ml. Nor is it hard to learn backend software engineering for the purposes of 99% of companies.
It's all about epistemology. How do we know what we think we know? How do we come to know things we didn't know before? And how can we trust those conclusions?
Even if the math is basic, it's really, really easy to draw bad conclusions, look at the wrong problems, not realize that your data is more incomplete than you might think, etc etc etc. Guarding against these bad results - figuring out how to actually manufacture new knowledge - is the heart of the problem.
> How do we come to know things we didn't know before? And how can we trust those conclusions?
IMHO, this is the heart of what discerns scientists from engineers. Yes there is plenty of overlap, but to me this is the principle component. In engineering, the correct answer almost always exists. Enough eyeballs on the problem, sufficient double- and triple-checking converges on high confidence.
Scientific problems may not have a right answer, or gaining confidence has diminishing returns, and at some point you decide that's enough sigmas. You can scrutinize in so many ways but there's always blind spots.
(Data) scientists generally have to be way more comfortable with uncertainty. And as you mention, the easiest person to fool is yourself.
99% of companies? Definitely not. The skills needed to do DS in business or healthcare are not very correlated with doing DS for the physical sciences. Which is the whole point of this comment thread, sure you can understand DL, but you also have to have an understanding of the field to know what type of DL to use. For example, in my role, I came with knowledge of machine learning but had to learn complex fluid physics to be able to know what type of DL techniques to apply or develop.
I agree with most of what he's saying but reading the first sentence almost stopped me in my tracks when I got to "obsessed". I wonder when exactly it was that "obsessed about this" and "obsessed about that" became a good thing. ...it's thrown around way too much these days, and I for one think that being obsessed with anything, regardless of how positive a thing it is, always speaks to a psychology that is defective in some way or another.
Similar to how some companies are intent on speaking about themselves and employees as "family," you may be having the same reaction from the word usage of "obsessed."
Of course, most companies say you must now be "obsessed" with customers, or quality, or some such things.
Language control in this respect seems a bit easier to understand with the phrase that became very popular for companies to try and model Apples consumers during the iPod revolution (at least how I remember it while studying a bit of Industrial psychology and Marketing / Cognitive Neuromarketing (that fad died out thankfully))
I'm sure its been around for a while, but its been more "we need brand evangelist fanatic zealots" for the last 10-20 years.
"Obsessed, evangelist, brand fanatic, turn your customers into zealots," etc is also used heavily in internal company "values."
An interesting description of obsession I've come across is that it's what happens when the will is frustrated. So maybe temporary obsession can be a good thing, if it's a sign someone's chosen a task so difficult that they need to expand effort to overcome a significant hurdle.
> Doesn't it just mean that the meaning of the word has changed?
...I do feel a bit bad amount mentioning it, because it's pretty tangential to what the article is actually about. That said: Changes in meanings of words often go hand-in-hand with broad-based changes in the way people think about something, and it's useful to reflect on whether or not one wants to go along with that thinking.
There is even a bit of a clichee anyway around sciency-engineeringy folk falling within the "obsessive" range of the personality spectrum in the very original sense of the word where it might be something that a psychotherapist might work on to try and rectify. So when I see it in this particular sphere being attached to a positive value judgment and even with slightly prescriptivist overtones, then it's something that to me really "pops" and it's been happening to me more and more lately.
Good data scientist described here seems to have unrealistic expectations at super human level of know-it-all/do-it-all.
I think there are more well-established job architectures like business intelligence analyst, data engineering, user experience designers, product manager, software engineer etc - these roles in combination serve to do a lot of what is described here as data scientist. These roles are easier to hire, have well defined career paths and good ways to get job satisfaction and can scale well as the business-problem-space/orgs grows.
I think the scientist label should be reserved for those who actually do the scientific mathematical research – specialists who have done deep research in specific areas.
For applying pre-existing sciences to solve practical business domain problems, we need lots of engineers, analysts and managers etc who are all trained with AI-first software development practices and just a few specialist data scientists.
Agreed. The second point about pipelines stuck out to me:
> [Good DS] will often build these pipelines themselves. Bad DS thinks it is someone else’s job.
In a small environment, sure, do the job so it gets done! But in larger more corporate settings the 'cowboy' approach to pipeline building is not sustainable or even feasible. Am I a bad DS because I can't provision VMs, open firewalls, replicate production DBs and build hooks in other teams' services to expose data? No, its not my job. A good DS collaborates with other teams and sysadmins to build a pipeline that is maintainable and monitorable, and doesn't do it all themselves.
Spending all your time on the clock acting as your own database admin, data engineer, app developer, sysadmin, etc. is a great way to accomplish very little data science and burn yourself out in a hurry.
Source: personal experience.
Are these useful skills to have? Yes, especially if it gets you out of a bad situation and makes a customer very happy.
Are these requisites for being a "good" data scientist? No; arguably, they are distractions and wasted room in your brain.
Completely agree a DS shouldn't be responsible for all those things you mentioned (all the infrastructure, networking, security, etc.). In today's age, "data pipelines" can look like plugging into tools & frameworks that abstract away a lot of this complexity. For example - fivetran for extraction & dbt for data modelling. So once those engineering/devops heavy things you listed are abstracted away, I think it's fair for a DS to be responsible for building the data transformation logic in the pipeline. Of course, not all companies will have such tooling available, in which case it's not fair to place those expectations on a DS.
> Good data scientist described here seems to have unrealistic expectations at super human level of know-it-all/do-it-all.
Hmm. Know-it-all/do-it-all is a useful standard to strive for, though, even when, in practice, one will often fall short in one area or another.
One of my personal frustrations is that I have invested heavily in trying to be well-rounded and it doesn't quite pay dividends because of how often I find myself confronted with prejudice of the form "because he's good at X, that probably means he's bad at everything else". For example, if the first impression I leave on someone is that I'm good at math, they'll often jump to the conclusion "because he's good at math, that probably means he's bad at databases". If the first impression I leave is that I know a lot about finance & economics, they'll assume "because he knows a lot about finance & economics, that probably means he can't do projects in a technical domain" and so forth.
Well, the expectations aren't unrealistic - if you were to grant the "good data scientist" a reasonable amount of time rather than demand that everything be done by this afternoon, which is what most "real data scientists" are up against.
>Good DS thinks from first principles. Bad DS accepts everything they have heard or seen as the ground truth, or the best way to do something.
Domain knowledge - and the humble attitude that can get stakeholders to give it to you is fundamental to understanding data and how models will be interpreted and used. There is not enough "listen to others" in this list (although I read the "listen to customers" at the end). Listening... listening listen!
This reminds me about a time when some geneticists tried to find genes associated with a particular disease, to try to unravel why it occurs. Complex trait, no single answer, so they genotyped thousands of people with and without the disease, and ran the stats. And... nothing.
What has one common name is actually several similar diseases, and the geneticists would have known that if they paid attention to the clinicians. Listening and incorporating knowledge is key.
[I'm thinking of an early glaucoma GWAS, IIRC, though there are similar cases.]
I think this story is very, very common. Still, some complex diseases (eg. Cystic fibrosis, Down syndrome) do turn out to be simple on a genetic level, so there is some merit to this approach.
Moreover, there is currently no better way to understand diseases genotyping thousands of people with and without the disease and 'running the stats', so it's worth the try
I would say that a good data scientist can quickly estimate where their time is best spent, either accepting what someone else has told them as-is or investigating themselves from the ground up. There's always more to investigate so using your time efficiently is one of the most important DS skills. Like solving a multi-armed bandit problem.
The knowledge that informs where to spend your time can be based on domain knowledge (and also experience of working with data in general), but the framework for estimating probabilities that investing your time will get you worthwhile results and acting on those probabilities has more to do with statistics.
Oh man! Domain knowledge is absolutely HUGE. I cannot even begin to tell you how much I've had to dive into literature on topics well outside of my domain to begin to understand how to use my outside perspective to come up with solutions.
Respecting stakeholders, and being able to be humble about asking for help understanding the domain is paramount.
This is a good list... for one type of data scientist - the type that has heavy involvement in product and business decisions.
Other data scientists are basically software developers with a very specific domain, a third kind focus a lot more on research and many data science jobs are some blend of all of these things. My point is that the author mentions in the intro how data science is very broad and then continues to focus on what's only a subset of all data science jobs.
With that in mind, the list is actually spot on - it's just good to know that it isn't relevant to many data science jobs.
Yep, some research-oriented DS people are (rightly) obsessed (correct word) with a particular family of techniques (variational inference! random forests! adversarial networks!) and work to find problems to apply that family to. They literally do pattern-match on their techniques with every new problem they encounter, and move on if it doesn't fit.
A lot of the other of your distinctions do still apply to such people, like knowing where the data comes from, knowing when to stop, and adjusting the message to the audience. So, still a good list.
Also, even the research DS people need to evolve their techniques over time.
> Good DS starts simple, ships, and then iterates. Bad DS starts with the most advanced technique they know.
> Good DS is constantly learning & evolving their toolbox. Bad DS stagnates and sticks with what they know.
These are the big ones imo. But not super obvious. As a junior data scientist I never needed to use anything but regularized linear models and decision trees. Maybe a random forest but the explainability usually wasn't worth it.
Recent explainability tools like SHAP have changed this somewhat. But for the most part I think its still ok for the average data scientist to be regularized linear models, decision trees, and then occasionally, idk, a LightGBM or Catboost + SHAP for explainability. A lot of people still don't know about these, and it's now a decent test for whether people are really trying to stay up to date.
A data scientist is someone that people wish was a unicorn but that is neither that nor a scientist, despite the name.
People who are _actual_ scientists usually in industry go by the name "scientist" or "research scientist", although they just data just as much. You can recognize them by the peer reviewed scientific papers they publish, often preceded by filed patent applications, as their work is novel. A real scientist wonders why some people call themselves "data" scientists, because science has always been about data, modeling and measurement.
But back to our "data scientist":
On a good day, she is generating value from the company's data to increase customer retention.
On a bad day, she is just doing the ETL prep work so the boss' other assistant can make that spreadsheet that aggregates the data that the boss' PPT slides will show.
This sentiment is quite popular among those who would like to have the same popularity that data scientists currently (well, more a few years ago, since there are many more critical voices now) have, but they don't.
Data science is a generic name. There are DS like me who have been "actual scientists" and others who until yesterday were working on dashboards and Excels files with 100 tabs open and pivot tables as far as the eye can see. Whatever, it is a name. What about "engineers"? It is a title with no legal value, people in the US can call themselves software engineers, but in many other countries, they could not. And who is a writer? Somebody making a living out of writing, somebody who has been published even if they got zero money for it and the magazine editor was their cousin, or else?
People in my team do causal modeling, use reinforcement learning for network configuration, NLP for chatboxes, computer vision for face ID, and (again) network configuration. They are all called data scientists. Thinking that what people who have the title "Data Scientist" do is "generating value via increased consumer retention" or "ETL for Excel files for the boss" is between misinformed and laughable, but mostly laughable. The world is much bigger than that.
Then, I agree that "learning from data" as a specialty has been over-hyped, and most companies do not have the maturity to take advantage of ML prediction, causal and statistical modeling, etc., but that's the nature of the world: one can take advantage of it or being bitter about it. I took advantage of the hype and I am fine, happy, and with no regrets. If tomorrow someone would propose to use for the same job the title "Data Monk" and it paid more, were more visible, and led to more career opportunities, I would grab it as quickly as I would grab 100 dollars floating in and out of the sidewalk.
On the contrary, many of us were amused at the birth of the term "data science". With "political science" and "computer science" as examples, we felt that including the word "science" in a field name was a bit "The lady doth protest too much, methinks". Those who named "data science" don't share our sentiment.
This was a deliberate effort to Balkanize statistics, form a new union, a franchise reboot. Had statistics instead been called "statistical science" you can be sure that "data science" would have chosen a different term.
Words face in a direction. While many roll their eyes at a Ph.D. going by doctor (it shouts insecurity, a poker tell for a second-rate institution), one also needs to understand the experience of a young female attempting to command a classroom's respect. The Brits love titles and hierarchy; at some level this is just a fashion choice.
Similarly, no one serious about food calls themselves a "gourmet", but the term has commercial value.
"Data science" isn't named for us. It's named for the clients.
"Statistics" already translates to "science of state", with the word for "science" omitted and implied. Some languages favor deriving new pompous terms from Latin roots (like "informatics"), while English favors blunt descriptive terms (like "computer science"). The original English term for statistics was "political arithmetic", but ultimately the continental term prevailed.
Many (most) scientists are also not everything they’re made out to be. Medicine, for example, has had a real replication crisis.
It’s important to distinguish between Science and scientists.
Finally ...if you’re running regressions, it’s better to get paid 300K than 130K.
“A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects.”
-- Robert Heinlein
Edit: I noticed "science" doesn't appear in the description of a good data scientist. That's ominous.
I think there is this false stereotype of the DS obsessed with cool techniques and detached from the business. Most DS want their work to have impact, actually like most people. But successfully applying data science is hard. We have incredibly mature tech for other problems, like for example databases, a marvel of engineering, and in comparison DS is a kludge. The value DS provides per $ is much lower although is considered a competitive advantage (DBs are a commodity) and I think this is one of the reasons it feeds this stereotype.
What is strange is that people have attached a "scientist" label to a role that they say its primary purpose is:
> exist to create business value with data
If you want someone to essentially not be a scientist, don't call them a scientist. Calling them a scientist and then complaining that they are not relentlessly obsessed with creating "business value" is just a broken idea in the first place. The reality is, people have jumped on this label as part of the hype cycle : employers want to pretend they have real data scientist positions so they can look good to their board / investors / PR, and so they can attract top talent, while employees want to put it on their resume because they think these skills will be valuable. Neither is going to get what they want when the hype fades.
It reminds me of how a long (long) time ago programmers used to be called computer scientists. It has taken a few decades but we finally now got to a position where we hire people for roles that actually correspond to what they do ... but I feel like this is now the pathway that ML / data science is on.
I’ve always found it funny how I have a degree in engineering and my title is “data scientist” whereas my coworkers have a degree in computer science, yet are called “software engineers”. Business titles are peculiar.
Interestingly, I'm a scientist, and I definitely create business value. I'm not doing basic academic research, with is fine by me, but I do R&D on technologies that will be in our products X years from now. I'm also the keeper of "how the product works," which combines knowledge from multiple science and engineering disciplines.
My job involves a considerable amount of data analysis.
> Good DS understands the basics of web technology
I'm not a data scientist but a portion of my job is creating pipelines, data analytics and such. I also only have a bare minimum knowledge of web technology. Why is knowledge of web technology part of being a good Data Scientist? Or is this point oriented specifically for data scientists working in web based companies?
Genuinely curious. I could imagine myself working as a DS in the future and that's why I found this article interesting.
I don't think there is a single correct answer here, but I'll offer a few insights from personal experience.
Firstly, valuable data tends to live in places accessible via web technology. Maybe you need to fetch a bunch of XML files from an FTP site? Having a clear understanding of all the nuances you're about to encounter will set you up for success.
Secondly, valuable data tends to be generated by web technology itself. Understanding that lifecycle can inform analytical strategy.
Finally, some data scientists add value by informing decision makers. One of the most powerful things you can do for them is give them a mobile friendly secure web experience that puts the data they need directly at their finger tips. While yes, Tableau et al. are an option here, you'll be ahead of your peers by knowing how to DIY it when it counts.
Data scientists take data assets that were not designed to be used for a particular task and set them to be used systematically and with integrity for that task. It's something that comes from having lots of data in enterprises which can be exploited to create value, but can also be used to make very bad decisions and confuse the hell out of everyone. Using data and using data well are two very different things.
If there is a lot to build, like data pipelines or software apps, as opposed to just “analyze”, I think it helps to add a word for the discipline of “engineering”, eg software, data, backend engineering.
The role mismatch between data and other engineers, vs actual (data) scientists, makes it difficult for decision makers to figure out which one they need
Not bs. But there is both a real GIGO problem, and a problem with under specification. It's certainly easy to propose DS analysis that are unlikely to have much return.
Thinking "data science is hot, we should do that" is different than "we have all this data and don't understand what it means". The latter is more likely to lead somewhere interesting.
But (and this is a Big But), the value of data science comes at the end of the data journey. Businesses need to be capturing data that is relevant and accurate before they can start analyzing it and deriving any value.
My experience with clients is that they get a ton of value out of that first step of thinking about what information they want to collect about their customers, then actually collecting it (or, conversely, surfacing what they already collect in a meaningful way). So while they come in wanting some kind of neural network powered prediction engine or whatever, they are often really impressed by pretty basic dashboards about their customer behavior.
Data Scientists can provide PMs with data and analysis to make better-informed product decisions. Then you can get into more detail, such as DS building tooling/dashboards/models for PMs/stakeholders to self-serve and save time for everyone.
Yes, there's some overlap with a Data Analyst position, but there's enough day-to-day work to differentiate.
Most software developers are just VSCode and JavaScript cowboys who know a lot of the buzzwords but lack even basic computer science fundamentals understanding.
sure. but most sds don't make as much of a fuzz about their work as data scientists. and from what I can tell, the average work of an sd is more challenging than that of a ds.
No one can do it all. DSs that do 70% of these are the best of the best.
Mature DS groups have figured out that you have to pick your poison, and focus on archetypes rather than a 'well rounded' DS. Here are a few DS archetypes that I've seen.
1. The NLP/Vision/RL domain expert: High depth, low breadth people. Not very concerned with business intuition. Strong grasp of math for their domain. Moderate coding abilities, but pipelining for their field is fairly well defined. What is SQL?
2. The Generalist : Comes close to the 'good data scientist' outlined here. Never publishes, solves DS problems, will probably struggle to reach principal IC level in any specific product group because they lack the prerequisite depth. Will often become a manager down the line though and can also become an excellent PM at some point. SQL is their life blood. The less business savvy people see them as MBA-adjacent. But, they are super important.
3. Mr Maths or the Statistician : Pairs excellently with #4
4. The MLE who doesn't want to be an MLE - Excellent coding skills. Sufficient ML/DS skills. Just hasn't found a way to get their foot in the door to transition to a DS role without taking a pay cut.
5. The Researcher : Hiring a researcher in the wrong team can lead to a completely ineffective team. Also, not having a researcher in a team that needs it can lead to everyone going around in circles.
Top DSs will manage to host a max of 2 archetypes in them. Trying to get your DS to host >2 archetypes, is a losing battle. This is as good as it is going get. Also, most teams don't need all archetypes.
Identify the archetypes you need. Get some coverage over them through your hired DSs and let them continue growing along their selected archetypes.