I made a post on trying to do absurd-but-controlled food photography with DALL-E 2 (https://minimaxir.com/2022/07/food-photography-ai/ ) which did get upvotes on HN but apparently did not make the front page (so may have been flagged?)
Because if you spend more than 5 minutes with it you can determine numerous things about the "astonishing" images you have seen
1) the amount of human curation is huge. For every 1 good image shared there are dozen of utter crap not shared
2) Dalle fails in some very systemic ways that make it completely unsuitable for vast swathes of image generation
(for instance the "N kittens problem" . Dalle is amazing at generating a picture of 1 kitten. Dalle is dreadful a generating a picture of 8 kittens and that is totally fundamental to how it works, not a bug that can be worked out with time.)
Also basically anything that requires recognisable detail in the background, Dalle falls flat.
3) prompt parsing is simultaneously hit and miss as well as laughably primitive. This is the "without" problem. Ask for a picture without some feature and there is a good chance you will get that thing in the picture.
images created using @openaidalle. sequencing and morphing in #python with credit to András Jankovics morphing library [github.com/jankovicsandras/autoimagemorph]. featuring borderlands granular synth (artist template: @kingbritt), other desert cities delay by audio.damage , #rymdigare reverb, mixed in #kymaticaaum.
Photography is an instrument the results of which are strongly dependent from the ability of the artist and technologist adopting it.
To the best of my understanding, DALL-E offers limited control and cannot be compared to a brush with paint, a photocamera, a virtual canvas for curves for illustration, a coding console.
Why? Because the weight of the user, its "importance", its "impact", is limited with that tool. (A commissioner is not an artist. A photographer may be.)
As HN member Moe wrote, «Bad analogies are like Vietnam».
and it was true until photography matured just like it is now with art generating AI. At least have a look at the early technology of your strawman argument. A better analogy would be the switch to digital photography it was really exciting because of the ease of use but years down the road nobody could use their early digital images for anything but stamps because of their atrocious quality.
I dont know how artists feel about DALL-E but as an amateur I feel bad. "This should be forbidden" bad. I guess the root of this feeling is the same as the one Copilot gives OSS programmers, it feels like theft and copyright enfringement. The pictures in this case uses techniques and colors scheme widely used by illustrator in the entertainment industries. Some of them are even above the average quality and that's scary too.
Do we know if regulators are looking into Copilot and DALL-E? To which extent do we want computer doing what human do? I mean.. Art? Feels like bad taste to me.
For what is worth, dall-e is great for exploring but it's nowhere near to being able to deliver a particular image you might have in your mind.
I wanted a very particular, well defined scene:
- A pig and a donkey play poker at the poker table.
- The pig is using a computer while playing and we can see the screen of the pig.
- The pig must look like a pig
- The donkey must look like a donkey
- The cards and chips must look like chips and cards
The dall-e simply can't deliver. Nothing is even remotely close to what I want.
The best things I came up with after dozens of attempts (I bought extra credits) is something like this:
https://i.gyazo.com/4bec0651b78f29a45c291a7f48f468e4.jpg
Kinda there, but the pig doesn't look like a pig or a donkey doesn't look like a donkey, or it's not a pig that has a computer and the cards and chips never look like cards and chips.
So in short - nobody is losing their jobs yet I think.
Have you tried creating it in multiple steps, using the "Edit" button? You can erase the parts of the image you want to change, and you can even change the prompt at each step.
If the pig or donkey doesn't look right, you could erase just that part of the image using the same prompt to get a different look.
For example, to create the image you want, I would:
1. Start with the basic prompt: "a pig and donkey playing poker"
2. Generate random variations of my favourite image from that to see how far I can get from that.
3. Edit as necessary with the same prompt to get the right look for the pig/donkey.
4. Erase a section of the image next to the pig and use a prompt like "pig using a laptop" to get DALL-E to generate a laptop in that position.
Yes, I have tried a lot, and still haven't gotten close to the desired end-effect.
I maybe want to shift my claim. I am not sure that it's impossible to create this particular image but that it's almost certainly cheaper to hire someone to draw the exact image I have in mind.
I think there is also a new proffesion comming: a DALL-E prompter job.
> I think there is also a new profession coming: a DALL-E prompter job.
Exactly, except we call this job "Artist" or "Programmer".
Whenever something like this comes along and people decry that it will "replace artists" or "replace programmers"... someone needs to generate the inputs to get what they want. Nothing helps solve the "But I know what I mean" problem. Either it's not good enough to do "general purpose" tasks, or it is, but it needs coaxing and someone who understands interacting with the systems well enough to get the desired output.
I agree with all you say with the exception that it is very distinct from being a programmer or an artist like a painter or graphical designer.
As a programmer I love that when I type [i*i for i in range(10)] I can predict the output and that the output will always be the same.
I get frustrated if the same action produces unexpected and non-reproducible results.
Good Dall-e prompter is more like a guide who can navigate through the unknowns. He knows how to use seemingly meaningless words to manipulate the beast. I think it's some form of art and at the same time like being a technician of a complex machinery or wild animal trainer.
These AI created images may not be a replacement for bespoke illustration or photography, but if the choice is between stock images and DALL-E, many people would prefer a DALL-E image that fits closer to what they want than what they may find by searching a stock image website.
I suspect this is where an API and additional cost reductions will move the needle even before we improve the models themselves (which seems to be coming at a rapid pace right now). I can see a scenario like this working well in the future:
1. Get close via prompt debugging to what you want (effectively where you are now)
2. Run an image generation pipeline that creates 10,000 images or an infinite stream
3. Run each image through an 'image to text' step for vector similarity filtering
4. Take images that have very similar 'image to text' similarity scores to the original prompt and present to the user.
Once we can run models of this quality locally, it can even be a job that runs overnight and you wake up in the morning to a set of results to look at.
All that will happen is humans start operating another abstraction layer up -- same thing as happened every previous time the machines have "taken our jobs".
Consider, however, that the output of these systems may not be copyrightable.
So, when you move human involvement up to a higher layer of abstraction, it’s possible that the economics of the whole effort will be fundamentally transformed. Meaning, if these systems displace human artists, copyright itself may cease to be a motivator of economic activity—removing a significant incentive for the production of new art.
Also, keep in mind that:
(1) there are likely to be many fewer human custodians of systems like this who sustain themselves economically than there are artists who currently sustain themselves by producing new art; and
(2) these systems are only as good as the artistic inputs that are fed to them, and is very unlikely that the contributing artists gave their consent or were compensated for their involvement in any way.
Sorry, I'm not seeing the downsides. That all sounds like a big improvement.
And regarding point 2: do you think human artists are as good as they are without already having seen lots of great artworks produced by others? Human artists don't create art from an empty vacuum of nothingness either.
I’ve used Midjourney for months now. Artists love it. It will lead to fewer people creating art the same way the cars led to less people traveling. It’s like having a pre-concept artist for for concept artist. Instant style boards to run by your client.
Comparing artistic production to driving is a poor metaphor.
No doubt that AI-driven tools can be leveraged by artists to create interesting things, in the same way that visual artists have used tools like Photoshop.
But there is something much more profound happening with DALL-E, etc. As I mentioned above, these AI systems simultaneously depend on human artists to populate its training corpus, while making it much less likely that these artists will be able to make a living producing art.
Even if other artists working higher-up in the value chain benefit from these systems, you are likely to see fewer professional illustrators and visual artists because these systems exists.
Something will be lost. We can hope that what we gain in return will be of equal value.
> The pictures in this case uses techniques and colors scheme widely used by illustrator in the entertainment industries.
"Widely used" seems to negate your point here, no? I would expect a machine to use widely used techniques, rather than ones specific to individual artists. I don't know about you, but I've never seen DALL-E replicate an art style that isn't popular enough to be common knowledge.
> Some of them are even above the average quality and that's scary too.
Is your suggestion to make systems like DALL-E worse? Or to forbid the creation of systems that exceed a certain measurable performance?
How do the weavers, whalers, candlestick makers, lamp lighters, and everyone else made redundant in the last few centuries feel? Why do artists find themselves special? The only reason they have avoided automation this long is because we haven't made machines that can think with any sense of creativity until now.
Many of us will become redundant thanks to automation in the next few decades. That's just how it is.
If you're a programmer working on non trivial problems you should be happy about copilot. It's just a tool to be more productive. Same with dall-e for artists. They will eliminate unproductive jobs and create new more interesting opportunities. In the long run technological progress is always good
I completely agree with you. If we’re just going to allow “AI” to eat into all human data and remix it in a way that only the 20 people involved in programming it make money (instead of the 2 million who were used as sources for the human data) then that is just the biggest stealth theft of wealth in recent human history.
It’s the equivalent of the technological enslavement of most humans who will be told that their inputs used in the AÍ “have no value” while the AÍ aggregates it all.
I agree that we should at least be concerned, I think the best argument against this stuff is that we should be building a world where AI replaces dangerous, repetitive, tedious work. Using it to take away the economic value of work humans ENJOY doing is dangerous. I think detractors that are eager to dismiss it as not as good as humans are wrong though and it's shockingly close to going far beyond what humans can do artistically. It won't be long before these systems can not only dream up an image from language, but make that image an animated 3d scene with dynamic lighting and animation and behaviors. If this technology keeps progressing media and artistic creation are going to be changed completely.
Illustrators also use techniques and color schemes that are widely used in the entertainment industry. It's not intellectual property and it's already happening.
During the beta, i must have done thousands of requests and was initially blown away, but now i can tell the "look" of a Dall-e generated image... it has these weird blurry spots that make it seem like a memory of a dream - the main schema is there but if you focus on any one point, the illusion is broken. Looking forward to the day that it is so polished that I cannot differentiate it from a human art piece.
Not OP, but I generated over 150 images using DALL-E 2. Results in the quality of the images in the gallery are very common. Usually, for prompts as simple as this most of the output images (there are 4) look as good or better.
I’m general, most Deep Learning processes these days are non-deterministic because 1) they only care about statistical correctness 2) there are some speed advantages in ignoring the existence of race condition bugs if you don’t care about being deterministic
Lots of weird artifacts which are very hard to fix.
The argument that users can now generate professional grade art by bypassing artists entirely feels so strange. I have access to Dall-E. To generate images without artifacts, you have to do one of these: a) Do a lot of cherry-picking which can be expensive. b) Prompt should be about an abstract concept which can "tolerate" any number of artifacts. c) Prompt should be about a common/generic concept that you have already seen a lot of times on the internet.
I think the biggest use case of Dall-E will be in removing creative block for artists.
I don't think that Dall-E as it currently exists is a big threat to professional artists.
It's not super hard to imagine a noticeably improved version of Dall-E being a serious threat to professional artists, though. It's a question of how hard it will be to make some linear improvements to Dall-E.
As a side note, I created an image in MidJourney from a prompt, which got a fairly pretty image that had some serious facial asymmetry problems, then uploaded the image to Dall-E and erased half the face, letting Dall-E fill it back in with a much more symmetrical look, and that kinda felt like the future. Using various AI models as tools for the things they do best.
Somehow I find the Dall-E and othe AI generated pictures revolting … is it the choice of colors or what I don’t know? It’s like looking at an art piece without a soul ..
And I suspect this points out one of the more common future uses of the tool. Not replacing photographers per se, but certainly taking a large chunk out of the already shrinking Stock Image market. If instead of having to find a stock image that I can use as the base for my work (usually with at least SOME tweaking) I can just describe what I'm actually trying for as a final product...
I'm not going to be getting rid of my Sony A7riv anytime soon, but this certainly would discourage me from trying to increase my library at Getty
Just generating images is barely doing the tools justice though - you can create entire mini movies with it, like SALT (a 70s sci-fi adventure happening on Twitter): https://twitter.com/SALT_VERSE/status/1536799731774537733
DALL-E's inpainting feature is incredibly powerful to generate very large scenes: https://twitter.com/fabianstelzer/status/1545752145273802752
Hard to believe that we're only beginning to scratch the surface here...