The reason these companies don't fix these systems is because they don't know ho...

dheera · on Sept 7, 2021

[flagged]

opinion-is-bad · on Sept 7, 2021

Is there a sizable population of unemployed black engineers living in the United States to hire from? What if the qualified candidates simply don’t exist to fill the seats?

hnxs · on Sept 7, 2021

are you familiar with the h-1b visa system? there’s a shortage of qualified candidates across every metric you can name for top-band engineering roles in the US.

oefrha · on Sept 7, 2021

It’s a probabilistic system. It can give you correct results on one billion samples and still get the 1,000,000,001st case wrong. How do you test every single photo from the past and future?

IncRnd · on Sept 7, 2021

Did you read the article? There was a single misclassified video.

ZephyrBlu · on Sept 7, 2021

> sorry it seems discussion about solutions to racial issues isn't welcome here

Oddly it seems like you are the person who doesn't want to have a discussion.

You stated "As an ML engineer I can tell you confidently here's how you fix these systems" and people critiqued your solution. Is that not being open to discussion?

dheera · on Sept 7, 2021

Critiques and debates are fine, but downvotes don't encourage discussion, they scream "shut up" and disincentivize speaking about the issue ever again, and downrank a viewpoint instead of leaving it up for fair discussion.

Because presumably, if it works like many other systems, if you get downvoted enough, you probably get banned, so it means I should be scared to say what I said again.

If people were interested in debating like mature adults without downvoting then I would have left it up and continued the discussion.

User23 · on Sept 7, 2021

The moderators here are exceedingly reasonable. Don’t worry about getting banned on account of downvotes unless you’re violating the guidelines. And the staff at hn@ycombinator.com is shockingly responsive for a site of this size. They really do want to encourage intellectual curiosity.

IncRnd · on Sept 7, 2021

Okay. Why did you make an engineering discussion into a discussion about race?

anyfactor · on Sept 7, 2021

I second this but not for the conventional reasons of diversity but the subtleties diversity could be difficult to understand.

You need your consumer base to be represented in your product development system. In training an AI model we first test things with what we are personally have a bias for.

For example in training a stock prediction model, I am going to first test it with the most familiar stocks that I know or I have bias for. And I am going to adjust my model until I find the model being correct for my bias then test it across a larger dataset.

I don't work in AI but I know that test data is incredibly large and they supposedly systematically cover all bases. But what I am saying is this the chance of random events will go down when you deviate from a homogenous development team where everyone is biased towards the same things because they share cultural and racial overlaps.

fighterpilot · on Sept 7, 2021

"You need your consumer base to be represented in your product development system."

This is not the real motivation for diversity initiatives since otherwise you would see a push to hire more old Republicans.

nolaspring · on Sept 7, 2021

Richen the dataset it’s trained on enough so that the model is correct before you release it to prod.

IncRnd · on Sept 7, 2021

That's sort of obvious. How do you know that wasn't attempted?

ironmagma · on Sept 7, 2021

Even if it was fixed, in a probabilistic system like this, isn't it basically guaranteed to happen with some inputs?

IncRnd · on Sept 7, 2021

Is that a real question? Of course it will happen. In this particular case there was a single misclassified video reported in the article.

ironmagma · on Sept 7, 2021

Yes it's a real question, since there's nothing that says that a particular misclassification must happen. Watching cars go by on the road, one might suspect that at least one is driven by an alligator, but nothing says that it must be, per se, even the law of large numbers.

IncRnd · on Sept 7, 2021

Nobody said this particular misclassification must occur. But there will be misclassifications, which is what your original question asked. Since you know the answer, why ask the question? That's why I asked you if what you asked was a real question.

ironmagma · on Sept 7, 2021

Yes they did, I said that. But it was a claim made as a question, because I didn't know whether it was actually true. I still can't demonstrate formally why this would be so, because again, the reasoning and even veracity of the claim is still in question due to lack of anything but a hand-waved answer.

IncRnd · on Sept 7, 2021

There is no need to formally demonstrate. The veracity is clearly not in question. It must be true, due to the existence of the article we are commenting on now.

If you want to argue for the general case, you can simply prove the negation is false. Since it is incorrect to say that a network trained with a tiny percentage of possible inputs will never misclassify, it is true that a network trained in such a way will eventually misclassify. This is bolstered by training any network and seeing they always will misclassify something.

> Yes they did, I said that.

You didn't say that. You said a misclassification would happen on some inputs. That is different from saying on these specific inputs.

MBCook · on Sept 7, 2021

If it was we wouldn’t expect this problem to occur, correct?

polyomino · on Sept 7, 2021

We don't have enough information to root cause the problem

IncRnd · on Sept 7, 2021

No, that's not correct.

belltaco · on Sept 7, 2021

That makes it sound even worse that they knowingly released it without fixing it.

ummonk · on Sept 7, 2021

Are you saying an ML system should never be released if it doesn’t have perfect accuracy?

belltaco · on Sept 7, 2021

Tagging black people as monkeys is not a showstopper bug? If so it makes them look even worse than if it was an overlooked bug.

urthor · on Sept 7, 2021

Quite honestly, in a team that's stressed and run down to the line, checking for that particular classification in a model that has hundreds of classification targets can be really tricky.

Say you have 1000 classification targets. You have to produce a model that checks, for each target, the odds of it being classified as one of other 999.

You have to check, specifically, for "adult male as primate" out of a million potential combinations. And apply secondary business rules or optimizations to prevent that classifications.

So yes it's possible, but it's not cheap, simple or easy.

Facebook just decided to shove the model out the door and not worry about the consequences.

Quality engineering work, costs money and time. Facebook didn't spend it.

IncRnd · on Sept 7, 2021

I agree.

It does seem worse that way.

shadowgovt · on Sept 7, 2021

We don't actually know how to do that, or how rich is "rich enough." It's an open avenue of research to be able to extrapolate how well-tuned a neutral net is on data not in its training set.

Not to imply the problem is unsolvable, just that if an institution has zero tolerance for this mistake, the fix your describing is no guarantee it won't occur.

6gvONxR4sf7o · on Sept 7, 2021

That’s not quite complete, right? It’s that we don’t know how to do that without sacrificing other things.