You may have reached what you thought was a statistically significant size but unless you were selecting perfectly randomly - which I doubt you could in such an exercise - then it could not be called truly significant.
Data was from random volunteers, and at that moment in time over 10% of the U.S. employee base. I say size was statistically significant not quality of data because there were senior engineers who had been with the company for 15+ years and would skew the datasets. We had lots of demographic data too. Can you guess which race and gender won?
>I say size was statistically significant not quality of data
Size is not necessarily significant unless is a random selected sample (and I'm not sure volunteers would count as random).
Let's say you have a population of 100 and there are two groups (A-80% and B- 20%). Let's say that individuals in group A are more keen to voluteer themselves. You could have a sample of 50% of the population and still not have a representative sample of their entire population (ie. 49 from A and 1 from B), so the sample size didn't matter to much in that case.
There is also a difference in e.g. stopping random people on the street and asking if they are willing to participate, or posting an ad online and waiting for people to call, even though both are voluntary.
Yes, a disclaimer over the validity of the results would be nice ;). But to be fair, most surveys are taken rather seriously, especially by the common populace and the main stream media, ignoring the lack of absoluteness in the results.
I was very curious as to how they would attempt to mitigate that, but your link doesn't actually provide any information about it.
I think I would somewhat cautiously state that it isn't something that is possible to mitigate. There are simply two cohorts: people who are willing to take online polls and people who are not. It is inherently impossible to gather this kind of data on the latter.
I would be careful with defining winners and losers. How does pay distribution inside any given group look like? What about the age distribution? Health? Overtime policy? Who assigns work and the degree of responsibility attached to each assignment?
For example, if we define that white males get the highest page in average, we could also see in the same statistics that middle aged white males who suffer from health issues and can't compete in their group in overtime get less pay compared to someone with the same health and age group but of different race and gender. Are white males then winners?
1) Honestly, I doubt it matters how randomly the sample was selected, as long as it was representative of the population, does minor variation in the selection bias make much difference?
Even if the results are 'only valid to this group of people' rather than, broadly, the entire Microsoft engineering team, I'd say it's probably reflective of the broader situation as well.
Why would a small subset of the population have salary ranges that were deeply divergent from all the other groups?
Sure, depends on the sample size, but that brings me to 2...
2) Really? I think that I'm pretty happy to accept that when a large group of engineers gets together and collects some data and then generates some results from it...
...they're not all completely retarded and have no idea what statistics are.
If they came to the conclusion the results were biased, I think you'll find they did the math to justify those conclusions.
People -> not stupid. Especially not large groups of engineers.
(Also, while I'm here. What is 'statistically significant' anyhow? 99% +- 1%? Just throw some words around with out any meaning to them, and sure, we can totally argue about if things are significant or not)
Let me emphasize: '...they're not all completely retarded...'
What's the chance that one lone engineer is pedantic in math and stats?
Not 100%, sure. Not every engineer has that background, especially in computer related fields (is stats even taught in college for CS these days?).
...but that none of the people involved either 1) is pedantic and has a strong math/stats background, or 2) bothered to do some learning and read books (like that one) when working on either this, or some other stats related project?
Really?
Come on~
It beggars belief.
I'm willing to wager that a random sample of 10 Microsoft engineers will include at least one person who understands stats. Otherwise, that company is completely broken.
I would assume people who feel they are being underpaid might be more likely to respond to such a survey. And at the same time people who realize they are getting paid more might report a lower salary. This can completely skew the results.
As long as you rely on people volunteering this info you will always have such problems, no matter how small the p-value is.
It can skew the results, but it can't completely invalidate them.
Remember here, we're not trying to robustly estimate populations of engineers with specific wages. We're looking at a data set you would expect to be more or less without variation, and being surprised when there is 1) variation, and 2) that variation appears to have racial/gender/whatever correlations.
Now, I get what you're arguing; you're saying, the sampling is biased, so any of those seeming correlations may well be biased (eg. specific demographic consistently under reports their income, people who do report their income tend to be the 'lower' bracket of incomes, etc.)... well, fair enough.
Caveat any results you come up with; but it's not like the data is going to be completely useless and meaningless because it's noisy.
This isn't an arbitrary academic exercise; it's tool for people to use to evaluate their own job positions.
What's the alternative? Have no idea at all what other people are earning? If you don't have any data, you can't do anything.
Even if the data you have is noisy, it'll give you a lot more insight than nothing.
Sure, I don't endorse getting righteous and taking it up the ladder ('My <insert group here> is discriminated against!') without doing your due diligence about samples and caveats.
...but taking a spreadsheet like this to your next pay review? Your manager better have some good answers to give out if you find yourself on the bottom of the curve.
There's a difference between noisy and biased. As long as the data is only noisy (that is, it has some random variations) I totally agree with you that it's fine to use, and that the results should be robust. However, if there is some sort of bias that only applies to a particular subset of the samples, all bets are off.
Just as a totally imaginary example, what if men with higher incomes are more likely to share their salary information than men with lower incomes, while at the same time the situation is reversed for women.
So now you will end up with more reports of high income from men and more reports of low income from women, even if their pay distributions are exactly the same.
I am of course not saying that this is happening here, but these kinds of things would indeed completely invalidate the results.
Engineers by and large don't know much of anything about proper statistics, especially software engineers where there isn't even an intro stats course in most curriculums.