Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DALL-E Chess in Jungle and Dunes (imgur.com)
45 points by emadehsan on July 31, 2022 | hide | past | favorite | 55 comments


Amazed we haven't seen more DALL-E / Midjourney on HN. Probably the most astonishing new tech I've used since booting up a computer in the 90s.

Just generating images is barely doing the tools justice though - you can create entire mini movies with it, like SALT (a 70s sci-fi adventure happening on Twitter): https://twitter.com/SALT_VERSE/status/1536799731774537733

DALL-E's inpainting feature is incredibly powerful to generate very large scenes: https://twitter.com/fabianstelzer/status/1545752145273802752

Hard to believe that we're only beginning to scratch the surface here...


I made a post on trying to do absurd-but-controlled food photography with DALL-E 2 (https://minimaxir.com/2022/07/food-photography-ai/ ) which did get upvotes on HN but apparently did not make the front page (so may have been flagged?)


Because if you spend more than 5 minutes with it you can determine numerous things about the "astonishing" images you have seen

1) the amount of human curation is huge. For every 1 good image shared there are dozen of utter crap not shared

2) Dalle fails in some very systemic ways that make it completely unsuitable for vast swathes of image generation (for instance the "N kittens problem" . Dalle is amazing at generating a picture of 1 kitten. Dalle is dreadful a generating a picture of 8 kittens and that is totally fundamental to how it works, not a bug that can be worked out with time.) Also basically anything that requires recognisable detail in the background, Dalle falls flat.

3) prompt parsing is simultaneously hit and miss as well as laughably primitive. This is the "without" problem. Ask for a picture without some feature and there is a good chance you will get that thing in the picture.


here's one i did recently: https://vimeo.com/724055394

a reading of dreamtigers by jorge luis borges

images created using @openaidalle. sequencing and morphing in #python with credit to András Jankovics morphing library [github.com/jankovicsandras/autoimagemorph]. featuring borderlands granular synth (artist template: @kingbritt), other desert cities delay by audio.damage , #rymdigare reverb, mixed in #kymaticaaum.

headphones recommended, awards eligible

(text here: thefloatinglibrary.com/2008/09/02/dreamtigers/ )


Because it’s a novelty abused for attention with clickbait popculture references. It’s not ready for primetime.


"photography is a novelty abused for attention, it's not ready for primetime and cannot replace a great painter"


Photography is an instrument the results of which are strongly dependent from the ability of the artist and technologist adopting it.

To the best of my understanding, DALL-E offers limited control and cannot be compared to a brush with paint, a photocamera, a virtual canvas for curves for illustration, a coding console.

Why? Because the weight of the user, its "importance", its "impact", is limited with that tool. (A commissioner is not an artist. A photographer may be.)

As HN member Moe wrote, «Bad analogies are like Vietnam».


and it was true until photography matured just like it is now with art generating AI. At least have a look at the early technology of your strawman argument. A better analogy would be the switch to digital photography it was really exciting because of the ease of use but years down the road nobody could use their early digital images for anything but stamps because of their atrocious quality.


Some people are already using generated images to replace stock art in low budget zines.


I dont know how artists feel about DALL-E but as an amateur I feel bad. "This should be forbidden" bad. I guess the root of this feeling is the same as the one Copilot gives OSS programmers, it feels like theft and copyright enfringement. The pictures in this case uses techniques and colors scheme widely used by illustrator in the entertainment industries. Some of them are even above the average quality and that's scary too.

Do we know if regulators are looking into Copilot and DALL-E? To which extent do we want computer doing what human do? I mean.. Art? Feels like bad taste to me.


For what is worth, dall-e is great for exploring but it's nowhere near to being able to deliver a particular image you might have in your mind.

I wanted a very particular, well defined scene:

- A pig and a donkey play poker at the poker table. - The pig is using a computer while playing and we can see the screen of the pig. - The pig must look like a pig - The donkey must look like a donkey - The cards and chips must look like chips and cards

The dall-e simply can't deliver. Nothing is even remotely close to what I want. The best things I came up with after dozens of attempts (I bought extra credits) is something like this: https://i.gyazo.com/4bec0651b78f29a45c291a7f48f468e4.jpg

Kinda there, but the pig doesn't look like a pig or a donkey doesn't look like a donkey, or it's not a pig that has a computer and the cards and chips never look like cards and chips.

So in short - nobody is losing their jobs yet I think.


Have you tried creating it in multiple steps, using the "Edit" button? You can erase the parts of the image you want to change, and you can even change the prompt at each step.

If the pig or donkey doesn't look right, you could erase just that part of the image using the same prompt to get a different look.

For example, to create the image you want, I would:

1. Start with the basic prompt: "a pig and donkey playing poker"

2. Generate random variations of my favourite image from that to see how far I can get from that.

3. Edit as necessary with the same prompt to get the right look for the pig/donkey.

4. Erase a section of the image next to the pig and use a prompt like "pig using a laptop" to get DALL-E to generate a laptop in that position.


Yes, I have tried a lot, and still haven't gotten close to the desired end-effect.

I maybe want to shift my claim. I am not sure that it's impossible to create this particular image but that it's almost certainly cheaper to hire someone to draw the exact image I have in mind.

I think there is also a new proffesion comming: a DALL-E prompter job.


> I think there is also a new profession coming: a DALL-E prompter job.

Exactly, except we call this job "Artist" or "Programmer".

Whenever something like this comes along and people decry that it will "replace artists" or "replace programmers"... someone needs to generate the inputs to get what they want. Nothing helps solve the "But I know what I mean" problem. Either it's not good enough to do "general purpose" tasks, or it is, but it needs coaxing and someone who understands interacting with the systems well enough to get the desired output.


I agree with all you say with the exception that it is very distinct from being a programmer or an artist like a painter or graphical designer.

As a programmer I love that when I type [i*i for i in range(10)] I can predict the output and that the output will always be the same. I get frustrated if the same action produces unexpected and non-reproducible results.

Good Dall-e prompter is more like a guide who can navigate through the unknowns. He knows how to use seemingly meaningless words to manipulate the beast. I think it's some form of art and at the same time like being a technician of a complex machinery or wild animal trainer.


These AI created images may not be a replacement for bespoke illustration or photography, but if the choice is between stock images and DALL-E, many people would prefer a DALL-E image that fits closer to what they want than what they may find by searching a stock image website.


I suspect this is where an API and additional cost reductions will move the needle even before we improve the models themselves (which seems to be coming at a rapid pace right now). I can see a scenario like this working well in the future:

1. Get close via prompt debugging to what you want (effectively where you are now)

2. Run an image generation pipeline that creates 10,000 images or an infinite stream

3. Run each image through an 'image to text' step for vector similarity filtering

4. Take images that have very similar 'image to text' similarity scores to the original prompt and present to the user.

Once we can run models of this quality locally, it can even be a job that runs overnight and you wake up in the morning to a set of results to look at.


It has a hard time with the computer, but without, the results are almost usable:

https://imgur.com/a/lVqmnz3

Chances are that someone with prompt engineering experience could get it to produce the desired output with some more poking and prodding.

It'll certainly raise the lower-end bar for custom illustrations/stock footage.


I see what you're getting at, yet the result is still amazing.


All that will happen is humans start operating another abstraction layer up -- same thing as happened every previous time the machines have "taken our jobs".

It's a good thing.


Consider, however, that the output of these systems may not be copyrightable.

So, when you move human involvement up to a higher layer of abstraction, it’s possible that the economics of the whole effort will be fundamentally transformed. Meaning, if these systems displace human artists, copyright itself may cease to be a motivator of economic activity—removing a significant incentive for the production of new art.

Also, keep in mind that:

(1) there are likely to be many fewer human custodians of systems like this who sustain themselves economically than there are artists who currently sustain themselves by producing new art; and

(2) these systems are only as good as the artistic inputs that are fed to them, and is very unlikely that the contributing artists gave their consent or were compensated for their involvement in any way.


Sorry, I'm not seeing the downsides. That all sounds like a big improvement.

And regarding point 2: do you think human artists are as good as they are without already having seen lots of great artworks produced by others? Human artists don't create art from an empty vacuum of nothingness either.


You don’t see a downside to there being fewer artists creating art?

Art benefits humanity not only because we consume it, but also because we produce it.

Making art is part of what makes us human.


I’ve used Midjourney for months now. Artists love it. It will lead to fewer people creating art the same way the cars led to less people traveling. It’s like having a pre-concept artist for for concept artist. Instant style boards to run by your client.


Comparing artistic production to driving is a poor metaphor.

No doubt that AI-driven tools can be leveraged by artists to create interesting things, in the same way that visual artists have used tools like Photoshop.

But there is something much more profound happening with DALL-E, etc. As I mentioned above, these AI systems simultaneously depend on human artists to populate its training corpus, while making it much less likely that these artists will be able to make a living producing art.

Even if other artists working higher-up in the value chain benefit from these systems, you are likely to see fewer professional illustrators and visual artists because these systems exists.

Something will be lost. We can hope that what we gain in return will be of equal value.


> The pictures in this case uses techniques and colors scheme widely used by illustrator in the entertainment industries.

"Widely used" seems to negate your point here, no? I would expect a machine to use widely used techniques, rather than ones specific to individual artists. I don't know about you, but I've never seen DALL-E replicate an art style that isn't popular enough to be common knowledge.

> Some of them are even above the average quality and that's scary too.

Is your suggestion to make systems like DALL-E worse? Or to forbid the creation of systems that exceed a certain measurable performance?


It's purely luddite reasoning. The real objection is that it makes artists less valuable.

Which is unfortunate, because they already arent that valuable (save for the top ~1%). But it's not a good reason to oppose DALL-E.


The real objection is that it makes artists less valuable.

Close but not exactly. How do they feel about it?

At some point if we all feel bad, well this is very bad.

Maybe we should ban DALL-E for the same reasons we ban hard drugs: for the health of the community.


How do the weavers, whalers, candlestick makers, lamp lighters, and everyone else made redundant in the last few centuries feel? Why do artists find themselves special? The only reason they have avoided automation this long is because we haven't made machines that can think with any sense of creativity until now.

Many of us will become redundant thanks to automation in the next few decades. That's just how it is.


If you're a programmer working on non trivial problems you should be happy about copilot. It's just a tool to be more productive. Same with dall-e for artists. They will eliminate unproductive jobs and create new more interesting opportunities. In the long run technological progress is always good


I completely agree with you. If we’re just going to allow “AI” to eat into all human data and remix it in a way that only the 20 people involved in programming it make money (instead of the 2 million who were used as sources for the human data) then that is just the biggest stealth theft of wealth in recent human history.

It’s the equivalent of the technological enslavement of most humans who will be told that their inputs used in the AÍ “have no value” while the AÍ aggregates it all.


I agree that we should at least be concerned, I think the best argument against this stuff is that we should be building a world where AI replaces dangerous, repetitive, tedious work. Using it to take away the economic value of work humans ENJOY doing is dangerous. I think detractors that are eager to dismiss it as not as good as humans are wrong though and it's shockingly close to going far beyond what humans can do artistically. It won't be long before these systems can not only dream up an image from language, but make that image an animated 3d scene with dynamic lighting and animation and behaviors. If this technology keeps progressing media and artistic creation are going to be changed completely.


Illustrators also use techniques and color schemes that are widely used in the entertainment industry. It's not intellectual property and it's already happening.


You can not regulate away tech advancements.

Your competitors (researchers, companies, or countries - depending on granularity) certainly won't.


Unless it kills the souls of their people.


During the beta, i must have done thousands of requests and was initially blown away, but now i can tell the "look" of a Dall-e generated image... it has these weird blurry spots that make it seem like a memory of a dream - the main schema is there but if you focus on any one point, the illusion is broken. Looking forward to the day that it is so polished that I cannot differentiate it from a human art piece.


> that I cannot differentiate

I suggest that you aim for the day in which critics «cannot differentiate it from a human art piece».

(We already have plenty of contexts in which the layman can be fooled. Think e.g. of "populism" in political discourse.)


Did you use the same prompt for all the images? How much cherrypicking did you do? How many images did you generate to get this set?


Most inputs were a combination from these:

"Painting/Digital Art/3D Render] of [Animals/Foxes/Monkeys] playing Chess in [Jungle/Dune/Desert]"

Some inputs were specific: "Capaybara vs Groundhog Chess match" or "Llama vs Panda/Red Panda in chess match"

I almost used all the credits OpenAI gave me for DALL-E. This set consists of about 50-60% of all the images I generated.


Not OP, but I generated over 150 images using DALL-E 2. Results in the quality of the images in the gallery are very common. Usually, for prompts as simple as this most of the output images (there are 4) look as good or better.


Does anyone know if there is a company or team focused on outputting CAD using a tool like DALL-E?


Is DALL-E deterministic, like if I type the same phrase it will always generate the same images?


I’m general, most Deep Learning processes these days are non-deterministic because 1) they only care about statistical correctness 2) there are some speed advantages in ignoring the existence of race condition bugs if you don’t care about being deterministic


Ah! That makes sense. Thank you for the info.


It starts with noise, so you get a different result every time.


That's very cool.


Lots of weird artifacts which are very hard to fix.

The argument that users can now generate professional grade art by bypassing artists entirely feels so strange. I have access to Dall-E. To generate images without artifacts, you have to do one of these: a) Do a lot of cherry-picking which can be expensive. b) Prompt should be about an abstract concept which can "tolerate" any number of artifacts. c) Prompt should be about a common/generic concept that you have already seen a lot of times on the internet.

I think the biggest use case of Dall-E will be in removing creative block for artists.


I don't think that Dall-E as it currently exists is a big threat to professional artists.

It's not super hard to imagine a noticeably improved version of Dall-E being a serious threat to professional artists, though. It's a question of how hard it will be to make some linear improvements to Dall-E.

As a side note, I created an image in MidJourney from a prompt, which got a fairly pretty image that had some serious facial asymmetry problems, then uploaded the image to Dall-E and erased half the face, letting Dall-E fill it back in with a much more symmetrical look, and that kinda felt like the future. Using various AI models as tools for the things they do best.


These feel like tools for creativity. Although the end result might not be masterful, it's super cool for quick iteration and exploration.

Suddenly, people who don't have the skills to do that can start doing it and once you pick an output I totally see how you can improve on it manually.


I believe the next evolution in generative images is stringing them together!

If you can come up with the key frames with descriptions of the same style, a neat little program can interpolate them and produce a generative movie!


Somehow I find the Dall-E and othe AI generated pictures revolting … is it the choice of colors or what I don’t know? It’s like looking at an art piece without a soul ..


DALL-E has that quality, yes - I did a project where I tried to recreate my own photographs with DALL-E, which shows both its potential and limitations right now: https://twitter.com/fabianstelzer/status/1551663900776595461

(obviously ignoring the potential to conjure things you cannot photograph...)


And I suspect this points out one of the more common future uses of the tool. Not replacing photographers per se, but certainly taking a large chunk out of the already shrinking Stock Image market. If instead of having to find a stock image that I can use as the base for my work (usually with at least SOME tweaking) I can just describe what I'm actually trying for as a final product...

I'm not going to be getting rid of my Sony A7riv anytime soon, but this certainly would discourage me from trying to increase my library at Getty


Do you think you could pass a Pepsi challenge of random images from the internet vs Dalle?


Generating images seems almost a solved problem. What’s the next big problem to solve in this space?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: