Thanks for the shout, I've made something similar to yours (https://phantasmagoria.stavros.io/) and I needed a GPU backend. Trying out their sample script, it seems to take a minute or so to just error out with "taskID doesn't exist" or similar. Have you hit that issue too?
They're very early in the space. Would recommend looping back with them on Monday in their Discord. Haven't seen that specific issues personally yet though.
Sorry for the late reply: we're running via Banana, they quote half a cent per generation via Stable Diffusion. Rough estimate. Lots of optimization to do though.
A.I projects (and maybe all python projects in general) seem to always
be ridiculously tedious, error-prone to get running, such that its a rare, celebratory thing when someone releases something that's easy to use like this?
Cutting-edge software has always been like this. Web browsers were like this 30 years ago. Linux was like this 25 years ago. DNS and Unix were like this 40 years ago. AJAX and Comet (and really JS in general) was like this 20 years ago. Operating systems and high-level languages were like this 50 years ago. It takes a while for the rough edges to get sanded off.
Most Python projects install with a matter of a single pip command.
Back then, as in the early years ot the PC, software was mainly self-contained, single-executable extract-and-run.
Somewhere along the way, they became so complex as to require special installation programs and the like.
I'm not very familiar with AI and SD in particular, but from what I understand, this stuff is mostly-pure maths, so it shouldn't be a difficult thing to package and make portable. I know the models are rather large, but that's not really any additional complexity.
>> Back then, as in the early years ot the PC, software was mainly self-contained, single-executable extract-and-run.
Software in general? Yes. But of you tried to leave the path the manufacture prepared, then you entered a world of pain. I remember how difficult it was to connect my first smartphone around 20 years ago to my Windows PC using ActiveSync to achieve a synced calendar. Just one example of: there was no download and run solution for processes that seem simple today.
Broadly speaking the reason it's the whole "making sure you are using the correct version of python that uses the correct NVIDIA drivers via the correct Pytorch installation" that causes the issues.
This is actually a solved problem. But it's been solved lots of different times in different incompatible ways which tend to clash on any individual computer.
The largeness of the model does add complexity, you can't really package it as part of the binary for example.
It's not the case that useful software was ever self contained. If you recall trying to do anything online in the early to mid nineties, you'll remember how complicated it was to use almost any website and how much manual fiddling and configuration was involved to get online.
Ignoring the internet, early games and graphical applications were a mess of settings and configuration. Even today you often have to tune graphics settings to get playable game performance on anything but top line hardware.
If you look at the code that is typically put out by non-software engineers (researchers, etc.), there seems to be a complete lack of software engineering knowledge (obviously). Sure, they can write code to do clever things, but put it into a package? What’s that? Run a linter? Never heard of it. Git? It’s that thing you use to save to the cloud, right? Documentation? Of course we published a paper!
All respect due to them building something incredibly advanced, but you should view this as the place where the science has done its part and the software engineering is just getting started.
I find the code of stable diffusion not bad at all.
It's packaged nicely. It uses third-party libs where it should be. The algorithms are well laid out and reasonable easy to understand. This is not a project made by people without dev skills.
Really? I find the difference between the Stable Diffusion code and the code you find in a typical Python package to be night and day.
Why is there precisely one test (that has nothing to do with the core functionality)? Why is the Git history full of things like “finish”, “correct merg”, “fix more”, “add code”? Where is the linting config? Why are there print statements everywhere? Why does non-UI code have UI code embedded in it? Why is there random code commented out? Why is there no consistency across the codebase? Why is everything written as if they’ve never seen a Python project before (comments that should be docstrings, docstrings that should be comments, print("WARNING: ") instead of using logging or warnings, underscores in CLI flags, no shebangs in scripts intended to be run from the command line…)
Not all Python software gets all of this right, but it’s incredibly rare to have so many misses, even for hobbyist developers. Unless it’s code released by researchers, etc. It’s pretty typical in that context.
> This is not a project made by people without dev skills.
No, but it is a project made by people lacking software engineering skills, which is a distinction I drew in my earlier comment. Like I said, they can write clever code, but there’s a difference between bashing on code until it works and building it properly. This is the kind of codebase you get when you have people who have been writing code for a long time, but never in the context of a software engineering project alongside experienced software engineers they can learn from. Put them on a team like that and they’ll be forced to unlearn most of these bad habits fast because they’d never get their pull requests approved otherwise.
I’m trying not to be harsh – I understand that this code is more of a code dump from researchers than a real software project, and they’ve done some incredibly clever things here – but if somebody is suggesting that Python projects in general are like this, it really should be pointed out that this is not in this slightest bit representative of a typical Python project.
These AI tools are going to be built into professional tools like Photoshop someday, but until then it's sort of a hackers paradise. I think Charl-e lets a new cohort of hackers play with the technology without becoming python command line warriors
Christian Crantell has already made a Stable Diffusion plugin for photoshop[0]. Not 1st party mind, but given Photoshop's transforms already included I imagine they have people looking into it.
Most deep learning these days requires CUDA acceleration to enable GPU / TPU for the libraries (i.e. PyTorch, Tensorflow), which is an absolute nightmare to set-up.
I wanted to try SD on my machine, but I just couldn't get CUDA to work. Mind, the the openvino based CPU implementation works just fine (pip install -r requirements.txt was sufficient) and given my CPU and GPU (a 10 core 10850k and and old 1070), there's isn't probably much to gain to switch to the GPU other than power usage.
Sorry but the Python ecosystem is an absolute joke, every single time I interact with it there is at least a 45 minute session of trying to get it into the correct state for all the things to co-exist and run.
I can't believe people endure this stuff on a day to day basis, I dread it every time, the fact that different versions of packages can't co-exist and like installing something can downgrade my setuptools which then breaks my whole installation. Not even wrapping this all up within conda solves this stuff it just means you can burn the whole thing and start over easily.
Maybe it's user error, but I never encounter these problems anywhere else.
1) python appeals to a lot of people that work in development-adjacent industries (like AI). These people don’t usually have to care about packaging
2) Python has gone through many outdated forms of packaging
3) The zen of python seems to have encouraged everyone to install third party libraries for the smallest of tasks (implementing retries, formatting phone numbers, etc). These small packages often have only a few number of maintainers who end up dropping off the map.
Modern python package management works pretty well, but there’s so much debt in the ecosystem I’m not sure when it’ll be better for end users.
lstein's fork [1] isn't that bad and the instructions are pretty easy to follow. It definitely requires some knowledge of how to install software via brew, but these are generally good to figure out anyway.
Unfortunately this goes beyond A.I. stuff, it has become the state of software development and that's why people started shipping very large packages of everything so that you can have the exactly same environment so you can increase your chances that the code will run as expected.
For A.I. stuff I actually don't judge, these scripts are written by people who specialise in other things than software engineering and they simply put together some code to run their algorithms and as a result they are poorly engineered in many aspects.
Most Python projects aren't this tough. I suspect that they're using wonky libraries like Pandas, Numpy or some such that prioritize raw power over ease of installation.
I've not touched Python in a couple years, but Pandas/NumPy used to be the defacto libs for anything to do with data science, are they considered "wonky" now?
I mean, there are at least two different companies (Enthought and Continuum) that were founded on making the major scientific python packages easier to install.
Just to be clear, pandas and numpy are not the "wonky" libraries. They are, in my experience, basically two of the most easily installed and dependency managed libraries in python, given their ubiquity and maturity. Maybe there are machine configurations I'm not familiar with that they are not easily compatible, but I've never seen them cause issues. Usually it's cuda or other gpu stuff, or conflicts in less regularly maintained packages
2be honest numpy is ez to install on all major platforms. In deep learning I'm almost never saw usage of pandas but deep learning models have problems with PyTorch as some projects just lock old PyTorch version that just dosent work on new/old python version.
This is really cool and a fun way to try out this stuff I've been hearing about. One thing that'd be cool is a "retry" button that picks a different seed. My first attempt didn't turn out so great (https://i.imgur.com/zV48hCV.png)
Seeing copyrighted/trademarked icons in the examples (Darth Vader, for example) really makes me wonder how these models are going to play out in the future.
Today, these models are far ahead of the trademark attorneys, but there are powerful interests that are going to want to litigate the inclusion of these entities in the trained models themselves.
I hope people realise these are general purpose text to image systems, not just AI artists. They can be used to generate images for educational content, to generate ideas for design of clothes, shoes, mugs, interior and exterior design, caricatures and memes for social networks, virtual dressing booth, hairstyle and make up, customise games and game characters, they can help create bias detection benchmarks, maybe in the future even generate technical drawings.
So the art copyright angle should not be the only one taken into consideration.
Is there some comprehensive source about how to make the most of Stable Diffusion? I find the examples on websites much better than what I've been able to generate — they more closely convey the prompt and have less artifacts/clearly messed up parts
When people call themselves "prompt engineers" it's only half in jest. Half of generating something good is guiding the program into generating something good. That means knowing the right keywords to get specific styles or effects, a little bit of luck, and sometimes generating a prompt several dozen times and then creating variations from a seed once you find a specific seed that generated something close to what you liked. It's an iterative process and many of the fantastic images you see weren't "first generations" but likely the 20th or so generation after tons of trial and error working around a specific prompt/idea.
I'd recommend keeping a prompt list and finding what does/doesn't work for what you're after. Try shuffling the order of your prompt - the order of the tokens does matter! Repeat a token twice, thrice, hell make a prompt with nothing but the same token repeated 8 times. Play around with it! If you find an image that's very close to what you want - start generating variations of it. Make 20 different variations. Make variations of the variations you like best.
Also the seed is very important! If you find a seed that generated a style you really liked take note of it. That seed will likely generate more things in a similar style for similar enough prompts.
It's a semi-creative process and definitely takes some time investment if you want great results. Sometimes you strike gold and get lucky on your first generation - but that's rare.
If someone turns artist names and the quirky-but useful bits of prompting like 'Unreal Engine' as an image sharpener into a Mac app with Instagram style filters they'll make some money...
each engine is a little different as well. it's like learning to perform with a partner - like another dancer, musician, etc. you have to find the sweet spot where what you want and the tool *can do* line up.
Agreed and wondering myself, DALL-E seemed to do a better job of great looking images with brief prompts, but Stable Diffusion seems to need more specific prompts. SD is free though so would love to use it more.
CLIP-guided Stable Diffusion, or Dalle+SD, are both doable with current open source and will have much smarter prompting at the cost of even more memory use.
How fast does Stable Diffusion run on an M1 Max? I'm using an M1 Pro and I find it too slow. I'd rather use an online service that costs $0.01 per image but generates an image in a matter of seconds than wait 1 minute for a free one.
Interesting. I didn’t see an essential difference with higher values, so I settled on 25. Maybe I’m just impatient and my brain prefers more options even if they’re individually imperfect.
Depends on the diffuser you're using, the _a (ancestral) diffusers are aberrant in that they can yield good results with very low sample counts. I typically use 16 samples and get reasonably good results, but it's highly dependent on the prompt and settings as well.
On Replicate it takes maybe around 10-15 seconds. My M1 Pro only has 16GB and 8 cores and takes about 1 minute, so maybe the lower specs make quite a difference.
You should be able to get a regular prompt to generate in 5 seconds via https://mage.space. A100s under the hood. Unlimited too. (Note: I am the creator)
I don't know about less than 1 second but I just picked up an RTX 3090 Ti now that they're basically half off at Best Buy and it's definitely fast enough for interactively playing with prompts (single digit number of seconds).
Probably overkill and could get away with something like a 3060 or so, but the 24 GB of VRAM come in handy if you want to generate larger images. I pushed it as high as 17 GB on some recent runs.
The nice thing about the M1 is that the GPU and CPU share RAM so even though I have a 14" MacBook Pro, I also have a GPU with 16GB of VRAM. I pushed as high as 11GB on images and the fan didn't even turn on.
My 3080 can turn a 16 sample Euler_a @ 512^2 in about 1.5s (9.7 iterations/s). I've found you can yield pretty good results in txt2img with the settings. And once you've found a good image you can further iterate in img2img with loopback at approximately the same rate.
It's worth noting that I'm on a 5800X as well, I'm sure.
I guess it depends on how you do it. Depending on how I've set things up, I've found that more samples isn't necessarily better (usually just different). I suppose it's an optimization problem. I have found that I can pretty reliably look at a 1 sample image and kinda guess where the earliest iterations are going to go, and that might be the most appropriate workflow, actually, but beyond that it seems a couple of samples can drastically alter outputs, and likewise with prompt editing. Whereas with img2img there's a lot more control, I pick an input, I can force it to strictly abide to the image and the parameters I want, and as someone else said in- and outpainting are nice as well.
I guess I'm just manipulating probabilities in my favor?
I built a Svelte frontend for my SD instance and occasionally expose it to friends publicly. It runs on an old intel 6700k with a 2080ti and ive tuned it to generate images in about 5 seconds. The speed depends on various factors but you can prototype with settings that can generate images as low as 3 seconds and work your way up to more complex images.
I was using https://www.coreweave.com/gpu-cloud-pricing the A4000s here, with 20ish steps 512x512 and I think it was close to a 1-2 s IIRC. There are some consumer cards that can get close i'm sure with some tweaking of image size, steps and other SD tuning.
I downloaded this and tried out a few prompts like, "Mark Twain holding an iPhone", and got back an image of Mark Twain - once in some surrealist nightmare fashion and another more like a 3D render. Neither were holding anything, let alone an iPhone. Cranking up the DDIM slider didn't seem to do much. Trying the same prompt on mage.space (see the creators comment in this thread) produced exactly what I assumed it would.
Sometimes it's difficult to get certain combinations of things in a picture. It's easier if you provide a basic sketch with the components you want and use img2img (not sure how charl-e has it set up for img2img access, since I use the python original).
There's also a knack for writing the prompts, generally you want to write your prompt as a list of short sentences. Don't make your prompt too short. Use concrete and clear concepts, beginning with your main subject, and describe their relation. You can also qualify the background, the mood, the material and the style among other things. Generate more than one sample - I normally go for five or ten samples so there are better odds of getting one that works, since the AI could try to interpret the prompt in different ways that make sense (but not to a human).
I can't try it right now, but you might try something like: "A man is standing on the street, close up. The man has an iphone in his hand. The man is Mark Twain. Regular city background. Impressionist. Detailed."
Tweak a few times and more often than not you'll end up with something satisfying.
Because it depends on your online tool, Midjourney for example adds things to your prompts to make sure your images are better, OpenAI adds things to your prompts too.
Anyone have an M1 Ultra they can test this on? My 3080 Ti can render a 512x512 image in something like 7 seconds and I've love to compare against Apple Silicon.
There isn't any optimised diffuser on m1 yet most of them are just running basic MPS graph or even mix of CPU and MPS ops and ofc it is extremely slow I dont have time to test this implementation but author just use some other implementation with simple UI. So I would be surprise if it is faster than 10s per img with 20steps and probably more close to 25-40s and around 40s-1min per classic 512x512 50-60 step setting as are other models.
I had Stable Diffusion running on m1 and intel macbooks vein the first few days, but the original repo would have done people some favors if they either created proper conda lock files for several platforms or just used conda-forge instead of mixing conda and unnecessarily (I think there was one dep which actually wasn’t on conda-forge, besides their own things)
When I hit "generate" on my M1 Air with the default options, it just sits saying "initializing...0%" forever. Gave it five minutes, still nothing. Tried twice, same thing.
Is it... doing anything? Do I just need to wait 10 minutes? 20?
Can someone please explain how I can run this on my computer but something like GPT3 is too computational intensive to do the same? Isn’t text easier than images?
People have been focused on getting stable diffusion running well on M1 macs because their graphics systems have so much more horsepower than the Intel macs. The M1s also have a fast memory sharing architecture for graphics, and this needs an absolute minimum of around 8gb of vram — many Intel macs just won’t be able to handle this.
That's good to know as I just got a good deal on one and was wondering if the AMD GPU would be useful or if I needed to start planning for an eGPU with some NVidia silicon. Thanks!