"these algorithms are trained on historical data, and poor uneducated people (often racial minorities) have a historical trend of being more likely to succumb to predatory loan advertisements"
to
"Even if this is the most common question being asked on Google [..] this shows that algorithms do in fact encode our prejudice"
It's hard to make the case for prejudice when the example involves historical data, so the author weasels in an unrelated example to make the case that there is prejudice.
So which is it? There are correlates to credit worthiness which are indeed illegal, allegedly to correct bias. But the legal arguments have moved away from the concept of bias and now squarely focus on impact. See the recent Supreme Court decision on disparate impact for instance.
This seems to be the author's point of view to:
"Why is this ignorant? Because of the well-known fact that removing explicit racial features from data does not eliminate an algorithm’s ability to learn race. If racial features disproportionately correlate with crime (as they do in the US), then an algorithm which learns race is actually doing exactly what it is designed to do!"
So for the author, anything that affects one ethnic group differently is racist, whether or not it is born by the data. Prejudice or bias are irrelevant to his definition.
But then why focus on race? No scoring algorithm is perfect, and some people are always going to be shortchanged. So why not just declare by fiat: let's ignore people's height, it's height-ist. Height correlates to income and to credit worthiness. Even if you remove height from the data, your algorithm will end up picking correlates which can become a proxy to height... and the same will be true with virtually any attribute you can think of. So why race?
I understand you train of thought and it makes sense. If you remove the human element and you are dealing with facts, how can this be described as "racist." On the other hand, would you be comfortable just feeding an algorithm (say one for scoring university applicants) with "race" and allowing it to find (if it is statistically relevant) the races most likely to succeed and fail?
I agree that this stuff is hard to nail down in a logically consistent way. But, this stuff has a history. Group discrimination has had horrible results in the past. This is our sloppy duct tape fix. Algorithm based selection is now challenging the assumptions.
If an admissions officer offers places only to handsome, white men from upper class backgrounds, we see that as unfair because these things have no direct impact on their value as students. He's an ass. For the sake of argument, imagine an algorithm arrives at the same conclusion as the ass. Our moral instinct is confused. Societal impact is similar.
There certainly is something dehumanizing about selection algorithms making decisions. I hate buying insurance or being assessed for loans. It feels bad.
Basically, I don't think you can reductio ad absurdum this argument away. There are inconsistencies in our moral sentiments.
Our moral instinct is confused because our political agitators have based normative (should) conclusions on positive (is) claims. I.e., "racial stereotypes are incorrect, and therefore it's morally wrong to base decisions on them".
The claim is logically false - you can't prove a normative claim without a normative premise. In my example above, one possible normative hypothesis would be "and it's morally wrong to use incorrect premises to make decisions" (or possibly "it's morally wrong to use incorrect racial premises to make decisions").
Now our algorithms are revealing that underlying positive claim (e.g., "racial stereotypes are incorrect") might not be true either.
This is confusing, and rightly so - it demonstrates our entire belief system was predicated on nonsense. Hopefully we'll actually respond with some useful analytical philosophy.
Political messages are based on what works, what's appealing to our moral instinct. Personally, I hate the "born this way" argument for gay rights. It's irrelevant and it's not necessarily true. It might not even be ever true, who knows. But, it's getting the job done.
But if a neurologist proves that sexual orientation is acquired during childhood, are we going to throw out gay rights?
In any case, I think these moral arguments are (a) more of a counterpoint argument to the pseudoscientific racism of the Nazis and their predecessors and (b) more about innate genetic potential than socio-cultural traits.
I think the stronger moral argument is about equality of opportunity and (hat tip to a charmingly american ethos) individual merit. The idea is that it is more moral to judge people by their personal merit, rather than profiling and assuming even if that gives you a better chance of being correct.
If that were the best I could realistically do, yes (though that itself isn't realistic, there are many other criteria). Any other technique would, by definition end up unfavorably affecting a greater number of people.
Any imperfect criterion will always exist at the expense of some group of people (tautologically, the group of people to which the criterion is unfavorable). What difference does it make, ethically, if the group happens to be identifiable?
There are two separate issues being discussed at once: 1) baseless prejudice, expressed en masse, and therefore gaining algorithmic notability, and 2) empirically backed, statistically defensible stereotypes.
This second issue seems harder to resolve, because it will require lawmakers to clearly articulate a "bright line" law in the language of the algorithm itself. As the article acknowledges, algorithms can "learn" race, even if race is not an explicitly defined explanatory variable.
Looking forward to part two.
Edit: Why race? Because it's an obvious and easy example. I believe the author would be on board with your height example.
Regarding the second issue, Griggs v. Duke Power Co. asserted that requiring a high-school degree for a job that arguably didn't strictly require one was discriminatory because it could act as a proxy to race.
If the author is on board with height, I can take him down all the way to a reductio where you give everyone the exact same credit terms.
Well hopefully that case doesn't hold too much power via legal precedent, which is a concept I never fully understood.
I can very much appreciate your reductio argument, but if i had to wager, that's the way we're headed. Seems like aspiring lawyers should start cracking open the ML textbooks, because that case is a pandora's box.
It goes back to the old "correlation vs causation"
Machine learning choose the "shape" of a decision surface. Optimizing parameters of that surface may be defensible, or may nor be. Your algorithm may learn that people with my skin color, or who live in my neighborhood, on have a certain average behavior on your objective function.
I would word your #1 and #2 as (1) baseless prejudice that midjudges people in general, (2) statistical prejudice that misjudges many individuals.
But it's still a leap to treat me as the "average" in that group, instead of judging me on the content of my character. What guided your choice of parameters? Even if you did the best within practical limits, it still may be unfair.
Point taken. Though, for better or worse, correlation can very frequently meaningfully improve the decision-making process, whether the organization employing said algorithm is lending to consumers or something else.
So imagine a situation where due to actual prejudice (for example, redlining, segregated, inferior schools, employment discrimination), outcomes and circumstances are clearly predictable by race. If you're not careful with your algorithms, your algorithm and see the world 'as it is' and see it 'as it must be' or 'as it should be'.
Indeed. The roots of the words prejudice are well known to most English speakers. It's supposed to denote "pre" judgement. Judgement before actual evidence.
Regarding impact: It could well be argued that current-day impact is the result of prejudice operating in the past. The interaction of society, economics, and demographics are complex. Hopefully, we'll just muddle through, somehow.
Right... But isn't "judgement before actual evidence" all ML algorithms do? They try to statistically evaluate something about a set's members based on other features, because actual evidence of what we want to find out is missing. In my opinion this means that we either: 1) ignore the problem during the algorithm design phase and analyze the output with caution, or 2) decide that any statistical classification of human beings is "unfair" or even illegal (because it will have to discriminate people based on some properties that we know about them - be it race, or height or whatever) I don't see any middle-ground, since "prejudice" in my opinion is an unavoidable effect of ML algorithms as we know them.
> So for the author, anything that affects one ethnic group differently is racist, whether or not it is born by the data.
I would encourage to check out some of the papers at the FAT-ML workshop which was co-located with ICML. I don't think researchers are making that assumption.
In addition, race was used as an example because it is one of the sensitive attributes. The decision on what makes an attribute sensitive is based on deep discussions by society. The author of the blog post wants to communicate two points - (1) Algorithms can discriminate and (2) There is a need to understand HOW algorithms discriminate.
These are givens. You don't create any sort of decision algorithm unless you are trying to ask the computer to discriminate. The question is, do you understand what the political perception of the eventual outcome will be?
As an overly simplistic example: someone might point out that your algorithm's negative impact applies 75% of the time to racial minorities. There will be a political shit storm if that is the end of the story. But if you can show that the segmentation is actually 99.8% people in prison, you basically prove your algorithm is politically acceptable and pass the buck of outrage on somewhere else.
A good definition of fairness should not depend on the semantic meaning of any feature. What features we want to enforce algorithms to be fair wrt is a policy question, not a mathematical one. Mathematically you're right, I shouldn't focus on race. But race is the motivation for the research h, and I think it's a good one.
I think "unfair" implies prejudice and bias. Do you take it as axiomatic that a system that produces different outcome for different racial groups is "unfair"?
This is precisely the question :) we just don't know what the right definitions are, and neither do we have a good understanding of the consequences of existing attempts.
You don't seem to be asking that question at all. You seem to be including disparate outcome in the set of "unfair" things, by fiat, and from there looking for the logical closure of that concept under machine learning.
That closure is "Harrison Bergeron"... but one man's modus ponens is another's modus tollens.
I think I'm quite clear: we don't have an accepted and useful mathematical definition of what fairness means. This should, at least on some level, line up with our society's legal standards (as the corresponding mathematical definitions of privacy, security, etc. do) while allowing us to prove theorems or at least make well-defined conjectures about the properties of algorithms.
The question is what counts as fair. I'm not saying disparate outcomes is necessary or sufficient, though it receives a lot of attention. Even if we could quantify which disparate outcomes are adverse, it's not obvious how to design algorithms that meet those criteria.
It seems a system that produces different outcomes for two individuals differing only in their racial identity, or other "irrelevant factors", is unfair. The question is, of course, what factors actually are relevant.
> It seems a system that produces different outcomes for two individuals differing only in their racial identity, or other "irrelevant factors", is unfair. The question is, of course, what factors actually are relevant.
I'm interested in a little thought experiment:
1. for the sake of this algorithm there are two people: a white person and black person
2. race and other "irrelevant factors" aren't recorded or used
3. these people are except for race, otherwise identical
4. how can the algorithm discriminate between them?
It seems tautological that lacking race as a feature to discriminate with the algorithm won't be able to successfully discriminate, right?
So how does the discriminatory information somehow bleed into the system such that it negatively affects two otherwise identical (or practically so) people?
I'm not trying to defend the algorithm as the perfect arbiter or anything. I'm just honestly a little confused. Is it that the irrelevant factors are actually recorded somehow, but shouldn't be?
The author explicitly acknowledges that relevant factors correlate, in fact he suggests race as a factor will creep back in even if it is excluded as an independent variable.
He is thus suggesting that even given genuine group differences, different group outcomes are intrinsically unfair if the grouping happens to be race.
I think mentioning race makes the topic unnecessarily politicized. There are much simpler cases where we don't want to use machine learning algorithms for maximum efficiency gains.
For example, take health insurance. No matter if it's voluntary or mandatory, at root it's a system where healthy people pay for sick people. If you're healthy today, you still pay because you might get sick tomorrow. But if sickness were 100% knowable in advance, then healthy people would have no incentive to participate, or even to set up such a system in the first place! The system only works due to a lack of knowledge, and advanced machine learning algorithms could easily break it.
The root of the problem is that the market-efficient way to redistribute resources is not the way that leads to the most happiness. (The technical term is "diminishing marginal utility of money".) We need to give free stuff to those who are statistically less likely to win it in competition. If our machine learning algorithms disagree, that just means they're optimizing for the wrong criteria.
If there are intentional cross subsidies than it isn't insurance. The concept behind insurance is to spread risk, not to spread cost. There's value in insurance exactly because there are risks (i.e. the future isn't perfectly predictable). But improving our ability to predict the future is a good thing, not a bad thing. It is true that it would make insurance useless, but insurance is not the only mechanism for subsidizing those in need, in fact it is a pretty terrible mechanism for that, since it only works that way inasmuch as it is imperfect.
The root of your argument seems to be that we need to trick people into helping other people. That seems to be an ultimately futile strategy. Better to really convince them.
Well, Switzerland goes so far as to make all health insurers offer basic insurance to everyone. They can't reject anyone and they can't hike the price. I like that system, though you're right that "insurance" might not be the best word for it.
Redistrubution by trickery, or even by force, might still be better than no redistribution at all. I'm pretty confused about this issue myself, don't have any clear answers.
Moreover for less dramatic conditions, there are race-based inequalities in our fundamental knowledge of medicine. Diseases that afflict Caucasians for example are far more well studied than minorities in "the west". Moreover, because of the way treatments are tested, knowledge of safety and efficacy on potential treatments is biased against minorities (drug safety studies are largely done by white male college students looking for extra cash). Some people just have 'medical privilege' and the healthcare system perpetuates that.
If we "need to give free stuff to those who are statistically less likely to win in it in competition," we should do it openly, as a transfer, rather than rig every algorithm and, even worse, sabotage our medical knowledge. I certainly want to know about every illness in advance, even if it means abolishing the insurance system.
OTOH, if sickness was 100% predictable, those places with nationalized health care (and therefore predictable costs, some level of centralized control over administration, etc.) could deploy resources precisely where needed with minimal waste. In turn, these savings could be passed on to the taxpayers - reduced taxes, basic income support, take your pick.
To me, such an upside vastly outweighs the "downside" of driving a spike through the heart of the health insurance industry. (This also points out how, if you see a problem with "market-efficient" distribution, perhaps the problem is with applying market principles to everything and not with the machine learning algorithms.)
You can't make the topic of fairness "unnecessarily politicized". Fairness is inherently political. The difference between different people's notions of fairness, justice, and liberty are basically the founding reasons for political disagreements.
If you are asking the question "is this algorithm fair?", you are asking a political question.
Yes. And there is no greater example of unfairness in the US of A than that of racism. For someone to complain that mentioning race "politicises" a discussion requires a willful blindness to reality and often seems like an attempt to avoid any discussion of the topic in any context.
Any discussion of inequality in the modern world that does not explicitly include racism is necessarily deficient.
>I think mentioning race makes the topic unnecessarily politicized. There are much simpler cases where we don't want to use machine learning algorithms for maximum efficiency gains.
I think the point is so that it can bludgeon home that our work as technologists has serious social implications, and that we can't dodge them, even though this stuff is ridiculously difficult.
>I think mentioning race makes the topic unnecessarily politicized.
How does mentioning race make the topic unnecessarily politicized?
Is there really a strong contingent that believes algorithms that discriminate against people based on their race should be implemented and deployed?
Automated discrimination is pretty frightening and we should avoid implementing such systems. Further, we should be aware it can occur and should do what we can do detect and fix any occurrences.
There is a strong contingent that believes that the risk of you defaulting on a loan should be taken into account in the terms of the loan. There is also the ugly fact that the risk of you defaulting is correlated with race.
Why is that an ugly fact? Of course it's harder to keep up with payments depending on your race. It says nothing about any particular races, it's just a side effect of institutionalize do racism. There are decades of data showing discrimination in almost every aspect of financial life. It would be a miracle if that didn't affect default rates.
This problem is tightly coupled with issues of correlation vs causation. In real data, correlation is mostly transitive (there are toy exceptions - https://terrytao.wordpress.com/2014/06/05/when-is-correlatio..., but they are just that). This means that if you want to predict something, and that something is correlated with some some uncomfortable association, trying to predict the something without the uncomfortable association can leave little residual behind.
For example, if hair length is societally taboo for gender prediction and I make an algorithm that uses a "politically correct" determination using XY chromosomes, I have also made an algorithm that correlates with hair length. Moreover, if I try to statistically correct my algorithm so that it does not correlate with hair length, I end up with an algorithm that works on the tiny leftover residual created by people who buck the trend, i.e. one that's much more likely to be wrong.
Algorithms find both correlational and causal factors. If 9/10 men are from Mars, and you tell me you're from Mars, it is often via correlation that the algorithm labels you a man. You are not allowed jump to the assumption that, say, the drinking water on Mars is turning people into Men.
I don't see where causation entered the conversation. Correlation and causation have the same predictive power. If the water on Mars turns 90% of people into men or 90% of the people that drink said water are already men, either way if you tell me you drank the water on Mars I know with a 90% certainty you are a man. Algorithms find both correlation and causal factors which is fine because I want accuracy whether or not it is based on causality.
> Correlation and causation have the same predictive power.
Is only true in a system without feedback. If your response to observations can affect the subjets under study, then acting on the correlation can change the correlation.
If you decide to offer everyone in Nairobi a $million to join your super-jumpers space-exploration program, because Nairobi correlates to high jump ability, you will find that low jumpers will flood into Kenya to collect on your offer.
Whereas, no matter how much you abuse the fact that gravity make things fall, you aren't going to make gravity stop making things fall.
Feedback is just as much of an issue whether the relationship is correlation or causation. All correlations are either based on random chance (thus not statistically significant) or based on some causal relationship, even if we can't identify what it is. Could be A->B, B->A, C->A and C->B, A->C->B...
Correlation and causation have the same predictive power.
Whenever 2 independent factors contribute (casaully) to an effect, both factors are correlated to each the effect, but are not in generla (well) correlated to each other.
Looking forward to the next post, and glad that the author is working on this problem from a technical perspective! These issues are becoming highly important to society, but vague sociological and political accusations can be made from any viewpoint. Although they can help guide and inspire our research, it's unsustainable to depend on such imprecise concepts in the long run. Objective and neutral precise mathematical knowledge, that directly addresses these social issues instead of waving it away as technically irrelevant, is the only way to make progress in this area.
I don't entirely disagree, but I urge caution. When we're designing an algorithm, there are things we care about. We try to capture these intuitions with imprecise labels like "fair". Trying then to capture these labels in "precise mathematical knowledge" still leaves room for the mathematical formulation to disagree with the intuition as well as the intuition to itself be faulty in terms of tending to lead to the outcomes we want. In either case we could easily wind up unambiguously and effectively optimizing for the wrong thing.
I clicked expecting an actual discussion of "algorithm fairness" in the sense of how a scheduler can be fair (what that means seems to elude me still).
Instead of talking about "algorithm fairness", it appears to be talking about "algorithm equality", or alternately "algorithm legal-non-descrimination".
I'd expect one could just avoid feeding an algorithm data about enumerated protected classes (for jurisdictions where such a thing exists) and be done with it.
Avoiding non-protected-class data that is correlated with protected-classes seems almost contradictory: you'd need to feed the algorithm data about protected classes so it could determine the correlation, and then apply the inverse correlation to to the protected classes.
I'm not even sure such a thing is possible to do correctly, though I suppose what I've described is simply a techno implementation of affirmative action.
The bottom line is that a well made algorithm with a sufficiently large [1] data set cannot be fair or unfair. It can only optimize with an estimated accuracy.
If the numbers show that poor people are more likely to respond to payday loan advertisements, then that's what the numbers show. If the businesses are truly predatory then they should be regulated. It doesn't make sense to show ads for them to people who are statistically unlikely to respond to them, nor does it make sense to show poor people ads for institutions that are statistically unlikely to grant them loans. Algorithms aren't the problem here, the nature of the businesses is.
And in fairness, I notice that when I watch certain tv shows late at night I'm much more likely to see payday loan ads than when I watch during prime time. I've long suspected that this is because the kind of person who is watching "30 Rock" re-runs at 1am is less likely to be employed and more likely to need a payday loan. So if it's fair for a media buyer to select audiences that are most likely to purchase a product, why is it unfair for an algorithm to do it?
The article makes the point that algorithms are opaque and people often can't explain exactly why a decision was made. That's fine, as long as the results can be shown to be accurate with out of sample testing. It's good to understand why, but it isn't always necessary. The point of using advanced statistical modelling is to expand our level of capability beyond what our brains can easily perceive. The ability to draw reliable inferences from tens of millions of data points is a very powerful and wonderful thing, quite a leap for evolved monkeys who use lobes of fat to perform calculations. It's new, it's different, and it's definitely worrying. But the sooner we can adapt, the better off we'll be as a society.
[1] in this case "large" probably means two or three orders of magnitude larger than you were thinking
Like the question of whether a submarine can swim, the question of whether algorithms can be prejudiced is moot. Algorithms can, without their developers ever intending it, perform in ways functionally equivalent to prejudice. It is reasonable to seek to avoid them doing so.
If I charge a man extra for car insurance because men on average get in more wrecks, then I am discriminating based on sex.
If I charge someone extra for car insurance because he shares dozens of innocuous traits with other people who get in wrecks, I am not discriminating based on sex. If some algorithm can accurately predict the sex of that person by the same data, I am still not discriminating based on sex.
If I target blacks with predatory loan advertisements because that demographic is on average more susceptible to those ads, then I am discriminating based on race.
If I target an individual because they share dozens of innocuous traits with other people who fall for predatory loan advertisements, I am not discriminating based on race. If some algorithm can accurately predict the race of the individual based on the same data, I am still not discriminating based on race.
The software is less biased than a human because it has no concept of race. Even if the data can be used to accurately predict the race of an individual, it does not matter. The program will not spontaneously recognize the concept of race and then discriminate based off of it. If some trait correlates with race, that's because reality is biased, not the math, algorithm, company, or implementer.
It is not the responsibility of programmers to make their algorithms less accurate so ideologues can live in a fantasy world.
But that's how sexism and racism and any kind of discrimination works in the human mind. There are innocuous traits, like sex and skin color, but also many other traits, and we use them to make predictions about behavior. This is unfair discrimination; fair discrimination looks directly at the behavior of an individual, not some innocuous proxy for the behavior, even if that proxy is right 80% of the time.
For example, most people who have an account throwawayXXX on HN are using it temporarily to say something. However, I actually kept one such account for a long time. My username is an innocuous trait, but you're discriminating if you assume my motives and behavior will be like most of the other throwaway accounts, just based on my username.
That said, I believe unfair discrimination is unavoidable in life because we can't always wait around to see what an individual's behavior will be before we discriminate fairly. We couldn't function without stereotypes and assumptions. And so, while still unfair because of the potential for unjustified penalties, it's much better to look at many variables than it is to look at one or two.
Under that argument, using machine learning to make decisions on things like advertising campaigns or mortgage rates is still bad even if race and sex do not correlate. That's a much different argument than what the author is making. It's a decent argument too.
That said, I believe unfair discrimination is unavoidable in life because we can't always wait around to see what an individual's behavior will be before we discriminate fairly. We couldn't function without stereotypes and assumptions. And so, while still unfair because of the potential for unjustified penalties, it's much better to look at many variables than it is to look at one or two.
Then you think machine learning is the best solution that exists?
Thanks, I thought your points were good too. I think machine learning in combination with human oversight is the least bad. Machines can help eliminate human bias, and humans can pick up on things that the machines missed. I wouldn't trust a machine to lend out my money, but I would trust it to give me a lot of data about someone and make recommendations on that basis.
When you search Google, you get a mostly impartial set of results, but then you need to choose the most relevant from the top 10. Without Google, you'd visit far fewer websites based only on your accumulated experience. Without a human to filter through the top 10, you'd have to systematically read webpages in order until you found what you wanted. Maybe the example is a bit stretched, but machines and humans cooperating seems to work well, even right on the HN front page, or in the browser spellchecker as I type this message (apparently HN is not a word).
In a lot of ways (or by a lot of definitions) prejudice & bas is exactly what these algorithms and (or selection criteria for loans or insurance policies) do.
When we ban prejudice we are saying (IMO) two things. (2)It is unfair to the point of being illegal to assume that because one is black (old, transexual..), one is unfit to do a job, attend a university or somesuch. (2) It is harmful to society when such prejudice is widespread. It alienates or devalues a group and perpetuates social features we don't like.
I think for insurance, loans and such there is a sort of special allowance because these industries need to be biased and their bias, being cold and calculated is not unfair. They also don't specifically discriminate on racial/gender/etc lines and so they were never part of any particular anti discrimination effort.
Insurance companies particularly can work around any specific restrictions by substituting banned variables for allowable ones.
By hiring one person and not another, you are always discriminating. This kind is clearly considered in the "fair" range since it's directly relevant to the thing you are selecting for.
Which kind of discrimination are insurance companies doing?
Anyway, algorithims are far more commonly used today so the "problem" is bigger. Also, I think they are the sort of leak in the anti-discimiation logic. I mean there might not be any political campaigns dedicated to the right of the 22-30ag.1-2yrsedu.PSTC.QXRT statistical cluster, but that cluster may is discriminated against. The discrimination is impactful. It may mean they pay higher interest or insurance fees. They may be denied a lease. Non trivial impacts.
Is it fair? As the title here points out, biases & prejudices seems like a nonsensical vocabulary to use when it is based on "facts."
This is all tricky stuff. Paypal identified me at some point as "high risk." They have frozen €35 and they have reversed transactions made by me. Not uncommon. It happened because I happened to share some characteristics with some fraudsters. They aren't race, but they are not any different (from a common sense morality perspective) then being the same race as a fraudster.
I'm not sure where I'm going with this. I don't really understand it, but I don't think the relationship to legally prohibited prejudice is nonsensical.
As a self interested consumer, I guess I want whatever gives me the best price.
But, I do think this can get really tricky, socially. In some places mandatory insurance can be very high. Say a group (lets go with minority men 18-25, sounds plausible) are essentially banned from driving by unaffordable prices 5-10X the average premium. This is not unrealistic. They are disadvantaged in society because of their race and age. Even if you call it fair because it's a optimal, or rational or whatnot, we have the societal impact of disadvantaging people from birth. I think it's immoral.
Very often these articles ask excellent questions -- but fully 80% of their value is in the prompt-question.
On the one hand, this question explodes into the "general purpose AI" literature. Less Wrong actually originated from a smaller community of people asking what did it mean for AI to be "friendly" to humans, very abstractly.
On the other hand, there's "good citizenship" desiderata. Maybe Hubert Dreyfus is correct and there can't be general AI, but the fact of the matter is that our actor-networks (as in Latour) are already crowded with algorithms and machines. What does it mean for a machine to be accountable, dependable, reliable, and yes, fair?
An image classifier should be invertible -- if we're consistently getting that black men look like apes, we should know what the machine thinks an ultra-archetype ape is. Interestingly, this has been the focus of a lot of research since I took that NN class in college back in 2001 -- we're able to visualize hidden layers now.
A credit scorer should be consistent with some wider sense of credit worthiness, sure. It should be possible to increment your credit score, at the margin, with marginal changes to (real, important) inputs, rather than have it place you in a brittle "unreliable" type. You should be able to dig yourself out of debt.
All of these are kind of domain specific, but I haven't spent much thought on it. It seems to me that there are some general characteristics (invertibility, smoothness across key controls) that should be identifiable.
>Less Wrong actually originated from a smaller community of people asking what did it mean for AI to be "friendly" to humans, very abstractly.
Only tangentially. It was literally the next stage of Yudkowsky's blogging on Overcoming Bias, which substantially attracted as its initial commenters people from the SL4 mailing list, which dealt with many topics of transhumanist interest, including friendly AI.
I've long said that an ethics class should be a required "related course" (or at least be encouraged) for various Engineering and Business majors, not because it teaches you how to be a good person but to enhance the student's ability to clarify and explain their reasoning about ethical decisions they will inevitably encounter.
Almost all engineering accreditation organizations require an ethics class. However, most students and universities do not treat the class seriously. I took an honors course run by a philosophy professor aimed at general STEM students. We learned first about the different ideas of ethics, such as deontology and utiltarianism. We then applied these systems to various ethical issues in engineering, discussed books about STEM ethical problems, and were required to write a 15 page paper about a current ethics debate, discussing what perspectives had been taken on it(mine, for example was autonomous military drones). However, I then TAd for a non-honors ethics class for CS students, run by a CS adjunct who had so much work teaching she could not focus on teaching any of her classes well. Most of it was reading from slides, multiple choice tests, a 2 page essay of what one thought about an ethical topic, and no discussion of what ethics is, or why something might be unethical. With the exception of the class I took, which required membership in a specific honors college, and only had 20 students a semester all STEM ethics classes were taught that way.
It's not enough to have a course. You have to take it seriously.
The discussed situations are analogous to the routing of the interstates through US cities. Often, they just so happened to separate black communities from white ones.
(Off-topic, but your username is the name of an old western as well as the name of one of my favorite Irish reels.)
That ... doesn't describe anything about human interaction, which has been politics all the way down for the last million years.
This particularly applies given the "reality" that we're talking about here is human interaction and questions as to which groups get what when, i.e. politics.
Unless there is some identifier in the data that specifies race and the algorithm discriminates on that, it's racist. If not, then it's not racist. AM I RIGHT?
A sufficiently advanced algorithm could assemble a synthetic proxy for race from the other available data if race data are not already included.
If you know where someone has their hair cut and styled, and where they attend their religious services, if any, you can probably guess with good confidence how they would self-report their race.
If a company is found to be interacting with a legally protected group in a discriminatory manner, it really doesn't matter if it's doing so because of an algorithm running on a bureaucracy made of humans, or an algorithm running entirely in a machine.
"these algorithms are trained on historical data, and poor uneducated people (often racial minorities) have a historical trend of being more likely to succumb to predatory loan advertisements"
to
"Even if this is the most common question being asked on Google [..] this shows that algorithms do in fact encode our prejudice"
It's hard to make the case for prejudice when the example involves historical data, so the author weasels in an unrelated example to make the case that there is prejudice.
So which is it? There are correlates to credit worthiness which are indeed illegal, allegedly to correct bias. But the legal arguments have moved away from the concept of bias and now squarely focus on impact. See the recent Supreme Court decision on disparate impact for instance.
This seems to be the author's point of view to:
"Why is this ignorant? Because of the well-known fact that removing explicit racial features from data does not eliminate an algorithm’s ability to learn race. If racial features disproportionately correlate with crime (as they do in the US), then an algorithm which learns race is actually doing exactly what it is designed to do!"
So for the author, anything that affects one ethnic group differently is racist, whether or not it is born by the data. Prejudice or bias are irrelevant to his definition.
But then why focus on race? No scoring algorithm is perfect, and some people are always going to be shortchanged. So why not just declare by fiat: let's ignore people's height, it's height-ist. Height correlates to income and to credit worthiness. Even if you remove height from the data, your algorithm will end up picking correlates which can become a proxy to height... and the same will be true with virtually any attribute you can think of. So why race?