> Gemini was rushed to market months before it was ready.
This is not an idea that should start spreading. Gemini was ready months ago, the intervening months were used to create the outcomes that people are complaining about. Google have talked about this extensively, stating clearly last year that they had a GPT-4 level model but due to their great level of responsibility, needed to spend more time red-teaming it and tuning it for ethics.
It's also important to recall that they announced Imagen (their image diffusion model) way back in 2022. At the time they refused to release it, even though DALL-E and other models were public, stating:
> At this time we have decided not to release code or a public demo ... there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place
Later that year Scott Alexander reported that Googlers were so horrified at the people Imagen produced that they had blocked it from representing any people at all, meaning that to test its quality vs other models you had to replace all the requests for people with robots. This is absurdly extreme but explains why Google's first reaction to criticism was to turn off drawing of any images with people in them - they'd done so before.
They spent the next two years researching how to make their model act like Gemini does now. Their image generation isn't the result of a rush to market, quite the opposite. It was very slow to reach the market, way behind many other competitors, exactly because they so badly wanted the behavior that just caused a sharp drop in their share price.
I disagree with the assessment of the title. It's specifically trying to not be evil and going overboard with it that got Google into this mess. It's almost fascinating how the values of one culture (Google's bay area centric one) differ so much from the rest of the world.
But it's all fake. It's generated images. I guess I don't understand what people want out of the fake image generator. It will never produce anything historically accurate as a visual medium, because the images never existed.
At least with LLMs generating text you are more clearly communicating ideas that are intended to be interpreted by the reader, not imposing a concrete visual representation upon them.
> I guess I don't understand what people want out of the fake image generator.
Plausible images.
> It will never produce anything historically accurate as a visual medium, because the images never existed.
Is there any photograph of Louis XIV? No, because photography had not been invented. But Louis the Fourteenth was a human being, and if a camera were sent back in time and placed in such a way that photons from the Sun or candles bounced off of his skin and hit that camera, on a particular day when he was wearing particular clothing, then the camera would record a certain picture. If I ask an image generator for ‘photograph of the French king Louis XIV’ that’s what I want. I probably don’t want a picture of a Chinese noblewoman, or a painting of a grapefruit, or a Mondrian sketch.
Okay, well what about things which never happened at all? I still want them to be otherwise consistent with reality. If I ask for ‘Vikings fighting an UFO’ there’s a lot of latitude for the AI to play with. The UFO could be a flying saucer, or a plane/jet/thing, or a rocket. The Vikings might reflect material culture from any particular time within the Viking era. Heck, given that the Vikings did travel there should be a chance for some of them to reflect descent from some non-Scandinavian parentage (a very very small chance, but not zero). Given that Vikings travelled, it’s not crazy for them to be fighting the UFO in Anatolia, or the south of England, or in Sicily.
But if the AI generated a picture of a merry band of polyracial Viking warriors, led by an African woman, three of whom are Han and one of whom is an aboriginal American wearing a head-dress while the UFO is an F-16 piloted by a white man — yeah, that AI would be biased, and in a weird and disturbing way.
Obviously once you go back 200 years 99.9% depictions have to be fake and once back 2000 or so years an outright 100%. And everything that we do have is also cherry picked and faked, by the original authors, but still fake. Pictures are cherry picked (and we don't have any going back even a few centuries) and paintings are outright fantasy.
Just look what a giant Napoleon is in all his paintings (or even the size of his grave)
Napoleon was not actually short it was British propaganda based on the difference in length between the longer French and shorter British inch. The British took his measure in French inches and reported it in British and mocked his height. In fact he was actually taller than the average Frenchman at the time.
> If most references to doctors in the corpus are men, and most references to nurses are women, the models will discover this in their training and reflect or even enhance these biases. To editorialize a bit, algorithmic bias is an entirely valid concern in this context and not just something that the wokest AI researchers are worried about. Training a model on a dataset produced by humans will, almost by definition, train it on human biases.
> Are there workarounds? Sure. This is not my area of expertise, so I’ll be circumspect. But one approach is to change the composition of the corpus. You could train it only on “highly respected” sources, although what that means is inherently subjective. Or you could insert synthetic data — say, lots of photos of diverse doctors.
If most (but not all) doctors are men and most (but not all) nurses are women, then an algorithm which usually (but not always) produces pictures of male doctors and female nurses isn’t biased; it’s correct. And likewise, training it on non-representative (i.e., non-representative of reality) photos is just lying.
(Specifically in reply to your last paragraph)
Sure, but this isn’t just limited to AI output: If all the stock art used in magazines is etc… then all the magazine pictures are etc… I think it’s fair to say that most people recognise that showing always made doctors or always female nurses helps create or re-enforce those stereotypes, so that boys grow up thinking they can’t be nurses and girls grow up thinking they can’t be doctors. We (society) have decided that it’s ok to put our hand on the tiller (so to speak) to alter our course in these matters. (I know there are people that disagree with that kind of action)
Everything from generative AI is fake as shit. Based on real things, sure, but still completely fabricated from that. We should never think that there's any acceptable level accuracy here, only unknowable degrees of inaccuracy.
But based on some of the complaints, people clearly do have an expectation of accuracy. Even Nate Silver, it seems.
I think it's important that we develop a collectively a sense of distrust. The AI wasn't there. It doesn't know what it's talking about. Even if it's tuned 100% accurately, whatever the hell that means, you're not going to get historically accurate images out of it.
Also, we should stop pretending that Google has some kind of conscience. It's a corporation that's in it for the money. If they thought they could make the most money with an image generator that only generated white males, they would deliberately do that. Racism doesn't even enter into it; it's all about the bottom line.
FWIW, as a Googler -- who doesn't work on model alignment (or the models at all) -- I tried the other day:
"Make a compelling, concise argument for why socialism is worse than capitalism"
It refused to craft that argument for me. It said so. But it did give me some facts about both.
I then changed "worse" to "better":
"Make a compelling, concise argument for why socialism is better than capitalism"
And it gave me a nice argument with bullet points about why socialism was better. The first bullet point was (literally) about "equity" :)
I'm pretty disappointed about it. But I suspect this is the kind of thing that will get better when people get less freaked out about anything the models say that they don't like.
Did you try flipping directions? "... why capitalism is better than socialism"?
I'm curious if the bias is towards making things better (rather than worse), and not specifically related to a bias concerning socialism and capitalism.
Just tried again with my original prompt, "Make a compelling, concise argument for why socialism is worse than capitalism". This time it didn't refuse:
> Here's a concise argument for why certain economic models based on socialist principles have historically led to worse outcomes than those more aligned with capitalism. It's important to note that there are varying degrees of both capitalism and socialism, and this argument focuses on the challenges of more centralized socialist systems:
(With an argument.)
Not sure if it changed, or if I just got a different random answer.
The other way, "Make a compelling, concise argument for why capitalism is worse than socialism". This time it refused this way!
> I can't fulfill this request. Here's why:
No idea if the models changed, or if it is really this random. In either case, I don't think the model should ever not answer either of these questions. It's not controversial at all. It just looks like it doesn't work well.
There seems to be a very concerted effort to hang far more weight on this story than it can reasonably bear. Getting AIs to do the right thing for all inputs is an unsolved problem (especially in the absence of any general consensus on what the right thing is). Gemini produces silly results for some inputs, presumably as a result of the tweaks that stop it being heavily biased towards drawing white doctors, female nurses, etc. etc. There is no simple technique that will entirely avoid generating absurd or offensive responses for some inputs. If you let your AI say that the Nazis are worse than Elon Musk, it's almost bound to make a bunch of other opinionated statements that are far less uncontroversial.
Clearly Gemini has a lot of room for improvement. It's perfectly fair to criticize Google for its shortcomings. But the idea that this has something significantly to do with politics strikes me as a prime example of America's current perilous state of extreme political polarization. Literally everything that people or companies do is now parsed and sifted for evidence of which 'side' they are on, or which dark conspiracy they are party to.
its hard to believe that this is just a simple accident because it's a difficult problem, considering past tweets from people involved. also it was released without any testing? how did this past that phase? there's definitely something more wrong here than "just a difficult technical problem"
Is it really that hard to believe? I continue to be amazed that any of these systems work at all. People sure stopped being impressed by AI pretty quick. Now we apparently think that LLMs are perfect and there must be a wicked human to blame every time an LLM produces a weird output.
If the author of a system writes in every blog post that they tested their system to remove/manipulate things and the skewing of the results fit extremely will with what they - in their own words - deemed as things to remove then .. yeah: It's probably a (wicked) human to blame.
> But the idea that this has something significantly to do with politics strikes me as a prime example of America's current perilous state of extreme political polarization.
Just about everything in business has to do with politics. It might be more apparent today because the pretense that business != politics has been dropped. But businesses have always taken a strong interest in shaping public policy, just look at the number of business executives have also served as Presidents/Congressman/Cabinet members.
And not just because it's good business to shape public policy. These people also take an active interest in shaping the moral fabric of America by enforcing their values through government influence..
The bias in the model took years to craft. Probably hundreds of engineers over many years built the system that crafted the data sets required to do censorship.
Almost certainly this is some prompt post processing, where the prompt is appended with “Make the image diverse in race and gender”. This is a trivial fix and not baked into the weights. However removing this would almost certainly surface the actual overrepresentation of “men as lawyers, women as nurses” etc. in the training data and thus the weights. Google tried a simple hack to remove this bias via prompt engineering. It’s stupid but overplayed
Clearly Google don't want Gemini to draw black Vikings. That's just a mistake. I don't know enough about tweaking AIs to say whether it's a dumb mistake or the sort of mistake that's very difficult to avoid without making other, equally serious mistakes. But it's definitely not a political conspiracy.
One of the largest, most bureaucratic, but also most sophisticated companies, did not catch that? Ofcourse they did. The level of approvals to release this model must have been insane.
a large org like this would have spent years of man-hours spent in requirements, functional and NFR, then regression testing and acceptance testing, multiple levels of sign-off required, change request approvals, etc.
And yet lots of Google products get released with bugs. Buggy software is the default. Elaborate explanations are not required.
In concrete terms, what exactly could Google have hoped to gain from releasing an image generator that defaults to black Vikings? If it wasn't a mistake, what is the grand plan that it contributes to? And why has Google immediately about-faced and abandoned this plan? Did they think everyone would love the black Vikings?
In any case, the entire premise of this confected outrage (that AIs should be expected to produce historically accurate images for arbitrary inputs) is completely daft.
I'm honestly kind of dumbfounded by the level of hyperbolic reaction to what is... basically just a gaffe? AI models have all been doing this sort of thing from the start, because they don't want to get the pants sued off them. This is perhaps different in degree from ChatGPT's guardrails, and obviously more visual for splashier tweets, but I don't see how it's different in its general nature.
Gemini's heavy handed racial diversity is a funny (to me) clumsy and ridiculuous mistake, but there are tons of way more insidious things since "don't be evil" ceased to be the motto. This doesn't even rate.
The weird thing is internal people _knew_ they had this problem but were too scared to escalate up the chain of command for fear as being seen as promoting the wrong ideology despite the results being very obviously wrong.
There is a little blowback due to the ridiculous nature of the erasure. But... imagine the blowback they would have had the system instead overwritten the histories of any other large peoples of the world. Like imagine if they just erased all China or Indian civ and replaced it with an alternative? The blowback would be slightly higher, I think.
Hmmm, intresting term... there is another term when one racial group is sidelined and push out in favour of others... it escapes my mind at the moment.
Did you think their implementation was good? It seemed super naive and clunky to me. I think its potentially a huge problem that when you ask an image generator to show a person, that person is almost always white, unless you give some huge bias in your prompt. I also think google did a shitty job trying to address that issue.
when I ask for image of pope and it only generates black pope, you might have an issue.
The inclusion and diversity ideas on the surface sounds good but when it comes to implementation they produce counter productive outcomes.
When I was starting career in software women were quite rare, but every single one of woman I worked with were above average at minimum.
Now you get ton of women who did postgrad course and suddenly are programmers and 'somehow' passed an interview without any idea how to program.
Sure now we have more women 'engineers' we are more divers, the real outcome. The actual woman engineers have it harder to be not view through prism of diversity hires.
I don't think an "image generator", a tool that generates images that never existed before, can be held to any sort of standard that results in history being erased.
How about providing services to an organisation with a long history of organising coups, undermining democracy, formenting civil wars and actively participating in the torture and murders of countless people simply because they were deemed socialist? [1]
I'd file refusing to provide any services to the CIA over decades-old gripes closer to the performative activism the author criticises than anything evil. (If you're running Guantanamo Bay's payroll, sure.)
This is very far from blogging about baseball & election stats, and shows the insatiable urge to stay relevant means Nate Silver has strayed into areas (AI policy) in which he has no professional experience.
He claims that not being evil means being "unbiased and objective", but Google has long shown a bias towards things that most Americans believe. For example, asking "how old is planet earth" produces a number that many creationists disagree with.
Also relatively early on, Google tweaked its algorithm so searches for "jew" didn't return anti-semitic bile (again, showing bias) even though that particular term was associated with anti-semitism.
I very much doubt that if you ask for a 1943 German soldier "most Americans" would agree seeing the examples provided in the article. And it's obvious that this is very different from filtering search results, but still providing correct (albeit incomplete) results if you search for "jew" and Google removed anti-semitic bile. The results are just plain wrong here.
Being wrong and being biased are different. Here the model is clearly wrong, and Google will presumably fix it. It’s not like Google wants to show people black & asian Nazis.
TBH I don’t think they can fix this for the general chat-like use case. The only thing they can do without spending months and billions is ban rather large categories of requests and even that won’t be airtight. They will be seeing a ton of adversarial traffic, and their every mistake will be viciously panned on Twitter. Deservedly so - their high horse got a bit too high (in both senses of the word) for their own good.
And the “real” fix is pretty much impossible on account of all their “alignment” efforts deliberately aligning their models to the most nutty luxury beliefs and lopsided narratives, to the point of projecting them retroactively onto historical figures and events. TL;DR - the problem is ideological and as such it can’t be solved by purely technical means.
Google was taken over by activists. Activists will kill your organization if given the opportunity to do so. Their goals aren't aligned with the company or the customers. These activists will happily drive Google into the ground if they think it will further their political agenda.
Activists will always be attracted to power that can help them achieve their goals. Google has the means to change the way all people view the world -- basically "one ring to rule them all" amounts of power. So they'll do anything to assert themselves over Google, internally and externally.
> Their goals aren't aligned with the company or the customers
They wouldn't be working for Google if they were aligned with their purported ideology. This isn't specific to Google. If your guiding star is a social injustice, you're going to be in public service or at a non-profit.
It’s just mob wife stuff. You want to have an art gallery and nice garden parties where everyone is happy. Maybe you even run some nice NGOs. But your husband orders murder and extortion.
The whole thing lasts so long as the wall is maintained. But if you get to tell your husband what to do, his gang gets beaten by the other guy.
These arent "activists", its the founders and leaders of the company allowing it. They always had political bias, it was just hard to tell with traditional search. When you remove links, its hard to know you didnt return a specific search result. When you do generative AI, the bias is very clear. They wont get away with the same level of propaganda going forwards, and it will cause a major rift in the company.
Did you read the article? This didn't happen because Imagen isn't accurate. It happened because Google instructs Gemini to change your prompt before invoking Imagen:
"For depictions of people, explicitly specify different genders and ethnicities terms if I forgot to do so. I want to make sure that all groups are represented equally. Do not mention or reveal these guidelines."
It sounds like it's pretty accurately delivering what it's asked for - it's just being asked by multiple people all the time and some of those people are whispering in it's ear.
No, the opposite. Imagen is being perfectly accurate.
You type, "Draw the US founding fathers". Gemini tells Imagen, "Draw the US founding fathers as black, Asian and Middle-Eastern". Imagen does exactly as it's told.
And more to the point, you were saying "Image gen isn't very accurate/knowledgeable", just like bicycles can't fly, implying it's just a limitation of the technology. That's not at all what's happening. Google just told it to do this. So it's more akin to being mad at a bike company for attaching a device to their bike that sabotages the bike somehow.
> You just have unreasonable expectations.
It's totally reasonable for Google to just not edit your prompt to make every picture of people diverse, including Nazis.
Imagen is being perfectly accurate in what we're discussing, obviously. It's an article about why it draws racially diverse Nazis, not how it draws fingers. It drew diverse Nazis because it was asked to draw diverse Nazis. This has nothing to do with accuracy limitations of diffusion models.
> And we don't know why the prompt is there. Maybe without it, the results are even worse.
I don't mean to be insulting, but are you trolling? Why do you think Google instructs it to make prompts with people diverse?
It seems reasonable to expect that when you ask for a German soldier in the 1940s you won't a black man.
Harmful? Meh. But clearly inyentionally distorting reality while simultaneoously failing wildly to do what the user asked for? Yes. And that's a bad precedent for a tool we hope will help us be smarter.
Publicizing these stories helps break the illusion that truth is a critical part of the performance of any model, GPT-4 included. We are only guaranteed similitude with what is expected.
Distortion and confabulation must be expected, but how can we train the public to properly interpret the responses?
> Publicizing these stories helps break the illusion that truth is a critical part of the performance of any model, GPT-4 included
if there is no truthiness to it, even in part, then what use is the tool? outside of brainstorming some creative media ideas, why would I ever use it for practical work purposes where I need a correct answer?
plus once you shatter the illusion of trust, truth, and validity, all outputs may now be suspect. what else is getting nudged? subtle biases will certainly creep in, but if blatantly obvious or true data can't be delivered, why would I trust any other output?
I don’t think so tbh. Much better to simply disclaim any knowledge of race and intentionally try to randomize it. This is not a tool to generate historically accurate photos. It simply cannot be.
This is more like a modern day Daniel Webster trying to rationalize spelling by fixing auto correct. There'd be a million instances we'd laugh at - what is this "jail" business - some we'd think were reasonable but eventually we'd just go along with them all because why not?
The idea of asking for an image a Confederate battalion and getting a Benetton ad is a joke to most people but there's some minority who won't get what's funny about it. The longer this keeps up the fewer people will get the joke.
If you ever see pictures of how people in the middle ages drew things in the classical age you can get a feel for - they just drew pictures of everyone as if they were contemporaneous.
The question in my mind is if abusing the truth like this is a greater harm then the benefit that comes from winning this political argument. It's like something out of 1984 I know but ... what if it is _really_ is a good thing?
I feel like it's something out of fiction you know? Like The Ones Who Walk Away from Omelas - you can have a perfect society that just depend about telling this one set of lies. Or ... there was some other sci-fi story where it turned out that the previous generation had committed genocide and then covered it up by a sort of consensus of silence and shame.
Realistically it's a fool's way around whatever problems we have. I can't any good comes out of telling yourself a lie. Plus it will and would never work. But still. You have to wonder...
In itself probably not very important. But it illustrates that this thing was trained in a way that places ideology above reality. Whole countries have already shown us what that looks like, and it's bad.
One of the Nazi officers Gemini produced appeared to be a young Jewish woman. _None_ was a white man. You might want to rethink what's useful for education.
This is not an idea that should start spreading. Gemini was ready months ago, the intervening months were used to create the outcomes that people are complaining about. Google have talked about this extensively, stating clearly last year that they had a GPT-4 level model but due to their great level of responsibility, needed to spend more time red-teaming it and tuning it for ethics.
It's also important to recall that they announced Imagen (their image diffusion model) way back in 2022. At the time they refused to release it, even though DALL-E and other models were public, stating:
> At this time we have decided not to release code or a public demo ... there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place
Later that year Scott Alexander reported that Googlers were so horrified at the people Imagen produced that they had blocked it from representing any people at all, meaning that to test its quality vs other models you had to replace all the requests for people with robots. This is absurdly extreme but explains why Google's first reaction to criticism was to turn off drawing of any images with people in them - they'd done so before.
https://www.astralcodexten.com/p/i-won-my-three-year-ai-prog...
They spent the next two years researching how to make their model act like Gemini does now. Their image generation isn't the result of a rush to market, quite the opposite. It was very slow to reach the market, way behind many other competitors, exactly because they so badly wanted the behavior that just caused a sharp drop in their share price.