Hacker News new | past | comments | ask | show | jobs | submit login
Charl-e: “Stable Diffusion on your Mac in 1 click” (charl-e.com)
253 points by valgaze on Sept 17, 2022 | hide | past | favorite | 126 comments



Love it. If you don't have an M1 Mac, or don't want to wait, https://mage.space does unlimited generations currently. (Note: I am the creator!)


There's also https://dreamlike.art/ with much more options and does img2img

This is THE thread to bookmark for many more resources: https://np.reddit.com/r/StableDiffusion/comments/xcrm4d/usef...


I found this colab best it also have nsfw filter disabled : https://colab.research.google.com/drive/1jUwJ0owjigpG-9m6AI_...


8 minutes queue? Slightly faster than my Intel MacBook Pro :)


I feel like this can't be cheap to run?


We're using GPU serverless (via banana.dev), so it's actually not bad. Will have limits at some point, for now go wild.


Thanks for the shout, I've made something similar to yours (https://phantasmagoria.stavros.io/) and I needed a GPU backend. Trying out their sample script, it seems to take a minute or so to just error out with "taskID doesn't exist" or similar. Have you hit that issue too?


They're very early in the space. Would recommend looping back with them on Monday in their Discord. Haven't seen that specific issues personally yet though.


Ah, thanks! I didn't notice their Discord, I'll join right away.


Can you give an estimate of what the cost is per run?


Sorry for the late reply: we're running via Banana, they quote half a cent per generation via Stable Diffusion. Rough estimate. Lots of optimization to do though.


Thank's! It's pretty incredible what you can do with half a cent!


A.I projects (and maybe all python projects in general) seem to always be ridiculously tedious, error-prone to get running, such that its a rare, celebratory thing when someone releases something that's easy to use like this?


Cutting-edge software has always been like this. Web browsers were like this 30 years ago. Linux was like this 25 years ago. DNS and Unix were like this 40 years ago. AJAX and Comet (and really JS in general) was like this 20 years ago. Operating systems and high-level languages were like this 50 years ago. It takes a while for the rough edges to get sanded off.

Most Python projects install with a matter of a single pip command.


Back then, as in the early years ot the PC, software was mainly self-contained, single-executable extract-and-run.

Somewhere along the way, they became so complex as to require special installation programs and the like.

I'm not very familiar with AI and SD in particular, but from what I understand, this stuff is mostly-pure maths, so it shouldn't be a difficult thing to package and make portable. I know the models are rather large, but that's not really any additional complexity.


>> Back then, as in the early years ot the PC, software was mainly self-contained, single-executable extract-and-run.

Software in general? Yes. But of you tried to leave the path the manufacture prepared, then you entered a world of pain. I remember how difficult it was to connect my first smartphone around 20 years ago to my Windows PC using ActiveSync to achieve a synced calendar. Just one example of: there was no download and run solution for processes that seem simple today.


Broadly speaking the reason it's the whole "making sure you are using the correct version of python that uses the correct NVIDIA drivers via the correct Pytorch installation" that causes the issues.

This is actually a solved problem. But it's been solved lots of different times in different incompatible ways which tend to clash on any individual computer.


When you say, "Back then," do you mean 30 years ago, 25 years ago, 40 years ago, 20 years ago, or 50 years ago?


The largeness of the model does add complexity, you can't really package it as part of the binary for example.

It's not the case that useful software was ever self contained. If you recall trying to do anything online in the early to mid nineties, you'll remember how complicated it was to use almost any website and how much manual fiddling and configuration was involved to get online.

Ignoring the internet, early games and graphical applications were a mess of settings and configuration. Even today you often have to tune graphics settings to get playable game performance on anything but top line hardware.


...Comet? Is that a joke, lol? Because of the Ajax and Comet cleaners? I've never heard that term before


nope, (well yes it was a reference) but was a strategy in the early '00s:

https://en.wikipedia.org/wiki/Comet_(programming)


It still is, it's the heart of many of the most popular current apps like Slack, we can just do it without the dirty hacks because we have WebSockets.


It's just another name for push/pull?


For push, to browsers.


If you look at the code that is typically put out by non-software engineers (researchers, etc.), there seems to be a complete lack of software engineering knowledge (obviously). Sure, they can write code to do clever things, but put it into a package? What’s that? Run a linter? Never heard of it. Git? It’s that thing you use to save to the cloud, right? Documentation? Of course we published a paper!

All respect due to them building something incredibly advanced, but you should view this as the place where the science has done its part and the software engineering is just getting started.


I find the code of stable diffusion not bad at all. It's packaged nicely. It uses third-party libs where it should be. The algorithms are well laid out and reasonable easy to understand. This is not a project made by people without dev skills.


Really? I find the difference between the Stable Diffusion code and the code you find in a typical Python package to be night and day.

Why is there precisely one test (that has nothing to do with the core functionality)? Why is the Git history full of things like “finish”, “correct merg”, “fix more”, “add code”? Where is the linting config? Why are there print statements everywhere? Why does non-UI code have UI code embedded in it? Why is there random code commented out? Why is there no consistency across the codebase? Why is everything written as if they’ve never seen a Python project before (comments that should be docstrings, docstrings that should be comments, print("WARNING: ") instead of using logging or warnings, underscores in CLI flags, no shebangs in scripts intended to be run from the command line…)

Not all Python software gets all of this right, but it’s incredibly rare to have so many misses, even for hobbyist developers. Unless it’s code released by researchers, etc. It’s pretty typical in that context.

> This is not a project made by people without dev skills.

No, but it is a project made by people lacking software engineering skills, which is a distinction I drew in my earlier comment. Like I said, they can write clever code, but there’s a difference between bashing on code until it works and building it properly. This is the kind of codebase you get when you have people who have been writing code for a long time, but never in the context of a software engineering project alongside experienced software engineers they can learn from. Put them on a team like that and they’ll be forced to unlearn most of these bad habits fast because they’d never get their pull requests approved otherwise.

I’m trying not to be harsh – I understand that this code is more of a code dump from researchers than a real software project, and they’ve done some incredibly clever things here – but if somebody is suggesting that Python projects in general are like this, it really should be pointed out that this is not in this slightest bit representative of a typical Python project.


These AI tools are going to be built into professional tools like Photoshop someday, but until then it's sort of a hackers paradise. I think Charl-e lets a new cohort of hackers play with the technology without becoming python command line warriors


Christian Crantell has already made a Stable Diffusion plugin for photoshop[0]. Not 1st party mind, but given Photoshop's transforms already included I imagine they have people looking into it.

[0]: https://christiancantrell.com/#ai-ml


https://github.com/nousr/koi

Stable diffusion already "happened" for Krita


IT is in a package management crisis at the moment, and GPUs are not making things easier.


What's the relationship between GPUs and package management?


Most deep learning these days requires CUDA acceleration to enable GPU / TPU for the libraries (i.e. PyTorch, Tensorflow), which is an absolute nightmare to set-up.


NVIDIA has a bunch of docker containers which make this slightly less painful. Or maybe it’s just a different kind of pain.


These definitely help.. but still painful.


I wanted to try SD on my machine, but I just couldn't get CUDA to work. Mind, the the openvino based CPU implementation works just fine (pip install -r requirements.txt was sufficient) and given my CPU and GPU (a 10 core 10850k and and old 1070), there's isn't probably much to gain to switch to the GPU other than power usage.


I guess the old "NVidia -vs- Linus" battle on kernel modules is still being fought....


Sorry but the Python ecosystem is an absolute joke, every single time I interact with it there is at least a 45 minute session of trying to get it into the correct state for all the things to co-exist and run.

I can't believe people endure this stuff on a day to day basis, I dread it every time, the fact that different versions of packages can't co-exist and like installing something can downgrade my setuptools which then breaks my whole installation. Not even wrapping this all up within conda solves this stuff it just means you can burn the whole thing and start over easily.

Maybe it's user error, but I never encounter these problems anywhere else.


The issue is mostly

1) python appeals to a lot of people that work in development-adjacent industries (like AI). These people don’t usually have to care about packaging

2) Python has gone through many outdated forms of packaging

3) The zen of python seems to have encouraged everyone to install third party libraries for the smallest of tasks (implementing retries, formatting phone numbers, etc). These small packages often have only a few number of maintainers who end up dropping off the map.

Modern python package management works pretty well, but there’s so much debt in the ecosystem I’m not sure when it’ll be better for end users.


A bit, yeah. And it's been extra difficult to get it going on M1 Macs.


lstein's fork [1] isn't that bad and the instructions are pretty easy to follow. It definitely requires some knowledge of how to install software via brew, but these are generally good to figure out anyway.

[1] https://github.com/lstein/stable-diffusion/blob/main/docs/in...


If used to work but somehow was broken on M1Mac a few days ago. Not sure if fixed. Get an older branch if it doesn’t work for you.


You might have had to rebuild the dependencies.

I've been running the development branch, which has been working fine for me, but I've also rebuilt the dependencies a few times just to be certain.

One protip is to use symlinks for the training files.


Unfortunately this goes beyond A.I. stuff, it has become the state of software development and that's why people started shipping very large packages of everything so that you can have the exactly same environment so you can increase your chances that the code will run as expected.

For A.I. stuff I actually don't judge, these scripts are written by people who specialise in other things than software engineering and they simply put together some code to run their algorithms and as a result they are poorly engineered in many aspects.


Most Python projects aren't this tough. I suspect that they're using wonky libraries like Pandas, Numpy or some such that prioritize raw power over ease of installation.


I've not touched Python in a couple years, but Pandas/NumPy used to be the defacto libs for anything to do with data science, are they considered "wonky" now?


In my experience, they’re simultaneously important for data science and very annoying to install and manage.


I mean, there are at least two different companies (Enthought and Continuum) that were founded on making the major scientific python packages easier to install.


Just to be clear, pandas and numpy are not the "wonky" libraries. They are, in my experience, basically two of the most easily installed and dependency managed libraries in python, given their ubiquity and maturity. Maybe there are machine configurations I'm not familiar with that they are not easily compatible, but I've never seen them cause issues. Usually it's cuda or other gpu stuff, or conflicts in less regularly maintained packages


2be honest numpy is ez to install on all major platforms. In deep learning I'm almost never saw usage of pandas but deep learning models have problems with PyTorch as some projects just lock old PyTorch version that just dosent work on new/old python version.


What's the difference between this and Diffusion Bee besides a nicer website?

https://github.com/divamgupta/diffusionbee-stable-diffusion-...


There's a link to a code on the `nicer website`: https://github.com/cbh123/charl-e


Actually, diffusion bee has code too in the parent’s link, and got a nicer website here: https://diffusionbee.com/


This is really cool and a fun way to try out this stuff I've been hearing about. One thing that'd be cool is a "retry" button that picks a different seed. My first attempt didn't turn out so great (https://i.imgur.com/zV48hCV.png)


The steps size is too low.


Does a "1 click" Windows implementation of Stable Diffusion exist yet?


Had been available for a while, check out NMKD[0]. That's what I've been personally using the entire time.

0. https://nmkd.itch.io/t2i-gui


Real-ESRGAN and GFPGAN seem to be missing. Those help the image quality significantly.


Seeing copyrighted/trademarked icons in the examples (Darth Vader, for example) really makes me wonder how these models are going to play out in the future.

Today, these models are far ahead of the trademark attorneys, but there are powerful interests that are going to want to litigate the inclusion of these entities in the trained models themselves.


I hope people realise these are general purpose text to image systems, not just AI artists. They can be used to generate images for educational content, to generate ideas for design of clothes, shoes, mugs, interior and exterior design, caricatures and memes for social networks, virtual dressing booth, hairstyle and make up, customise games and game characters, they can help create bias detection benchmarks, maybe in the future even generate technical drawings.

So the art copyright angle should not be the only one taken into consideration.


Perhaps "Stable Diffusion on your ARM Mac in 1 click" would've been a more helpful title.


Is there some comprehensive source about how to make the most of Stable Diffusion? I find the examples on websites much better than what I've been able to generate — they more closely convey the prompt and have less artifacts/clearly messed up parts


When people call themselves "prompt engineers" it's only half in jest. Half of generating something good is guiding the program into generating something good. That means knowing the right keywords to get specific styles or effects, a little bit of luck, and sometimes generating a prompt several dozen times and then creating variations from a seed once you find a specific seed that generated something close to what you liked. It's an iterative process and many of the fantastic images you see weren't "first generations" but likely the 20th or so generation after tons of trial and error working around a specific prompt/idea.

I'd recommend keeping a prompt list and finding what does/doesn't work for what you're after. Try shuffling the order of your prompt - the order of the tokens does matter! Repeat a token twice, thrice, hell make a prompt with nothing but the same token repeated 8 times. Play around with it! If you find an image that's very close to what you want - start generating variations of it. Make 20 different variations. Make variations of the variations you like best.

Also the seed is very important! If you find a seed that generated a style you really liked take note of it. That seed will likely generate more things in a similar style for similar enough prompts.

It's a semi-creative process and definitely takes some time investment if you want great results. Sometimes you strike gold and get lucky on your first generation - but that's rare.


If someone turns artist names and the quirky-but useful bits of prompting like 'Unreal Engine' as an image sharpener into a Mac app with Instagram style filters they'll make some money...


each engine is a little different as well. it's like learning to perform with a partner - like another dancer, musician, etc. you have to find the sweet spot where what you want and the tool *can do* line up.


I search Lexica.art for the style I want, copy the prompt associated with the work and edit it my needs.


The reddit forum for StableDiffusion has a tag for prompts where you can get a large number of detailed examples to use:

https://www.reddit.com/r/StableDiffusion/?f=flair_name%3A%22...

Also, this post refers to a large number of relevant tools to use as well:

https://www.reddit.com/r/StableDiffusion/comments/xcrm4d/use...


Agreed and wondering myself, DALL-E seemed to do a better job of great looking images with brief prompts, but Stable Diffusion seems to need more specific prompts. SD is free though so would love to use it more.


CLIP-guided Stable Diffusion, or Dalle+SD, are both doable with current open source and will have much smarter prompting at the cost of even more memory use.


I've found the stuff in the DALL·E 2 Prompt Book also works well for Stable Diffusion

https://dallery.gallery/the-dalle-2-prompt-book/

If one prompt doesn't work, try writing it in another way. Sometimes it helps to write things in multiple ways in the same prompt.


How fast does Stable Diffusion run on an M1 Max? I'm using an M1 Pro and I find it too slow. I'd rather use an online service that costs $0.01 per image but generates an image in a matter of seconds than wait 1 minute for a free one.


It takes 20-30 seconds on my M1 Pro with 32GB RAM. I’m not sure I’ve seen anything faster online.


I clocked 18 seconds for a 512*512 image, 25 steps, on a Mac Studio with M1 Max and 32GB.


FWIW I don't I've ever gotten a satisfactory result from anything less than 50 steps


Interesting. I didn’t see an essential difference with higher values, so I settled on 25. Maybe I’m just impatient and my brain prefers more options even if they’re individually imperfect.


Depends on the diffuser you're using, the _a (ancestral) diffusers are aberrant in that they can yield good results with very low sample counts. I typically use 16 samples and get reasonably good results, but it's highly dependent on the prompt and settings as well.


On Replicate it takes maybe around 10-15 seconds. My M1 Pro only has 16GB and 8 cores and takes about 1 minute, so maybe the lower specs make quite a difference.


You should be able to get a regular prompt to generate in 5 seconds via https://mage.space. A100s under the hood. Unlimited too. (Note: I am the creator)


Wow! What's your motivation to provide access to such powerful hardware for free?


8 seconds here, awesome!


Speed has a massive effect on how willing I am to play around and develop better prompts. I can’t wait a full minute for an image, I just can’t.

What kind of computer specs would be required to generate typical SD images in less than a second?


I don't know about less than 1 second but I just picked up an RTX 3090 Ti now that they're basically half off at Best Buy and it's definitely fast enough for interactively playing with prompts (single digit number of seconds).

Probably overkill and could get away with something like a 3060 or so, but the 24 GB of VRAM come in handy if you want to generate larger images. I pushed it as high as 17 GB on some recent runs.


The nice thing about the M1 is that the GPU and CPU share RAM so even though I have a 14" MacBook Pro, I also have a GPU with 16GB of VRAM. I pushed as high as 11GB on images and the fan didn't even turn on.


It is slower than an NVIDIA you though. Maybe 30s per image on my M1 Max with 32gb


Roughly 4-5 seconds for 512x512 at 50 samples on a 3090 Ti


Have a 3060 and it's fine for me, took me ~8-9 secs to produce it at default settings


My 3080 can turn a 16 sample Euler_a @ 512^2 in about 1.5s (9.7 iterations/s). I've found you can yield pretty good results in txt2img with the settings. And once you've found a good image you can further iterate in img2img with loopback at approximately the same rate.

It's worth noting that I'm on a 5800X as well, I'm sure.


> iterate in img2img with loopback at approximately the same rate.

What's the advantage of using img2img as opposed to iterating on the seed value?


I guess it depends on how you do it. Depending on how I've set things up, I've found that more samples isn't necessarily better (usually just different). I suppose it's an optimization problem. I have found that I can pretty reliably look at a 1 sample image and kinda guess where the earliest iterations are going to go, and that might be the most appropriate workflow, actually, but beyond that it seems a couple of samples can drastically alter outputs, and likewise with prompt editing. Whereas with img2img there's a lot more control, I pick an input, I can force it to strictly abide to the image and the parameters I want, and as someone else said in- and outpainting are nice as well.

I guess I'm just manipulating probabilities in my favor?


You draw over the part of the image that is not ideal and get it to infill it


It takes a few seconds (haven't timed it), but I suggest doing it online at dreamstudio.ai. Paying about one cent per image isn't so bad.


I built a Svelte frontend for my SD instance and occasionally expose it to friends publicly. It runs on an old intel 6700k with a 2080ti and ive tuned it to generate images in about 5 seconds. The speed depends on various factors but you can prototype with settings that can generate images as low as 3 seconds and work your way up to more complex images.


I was using https://www.coreweave.com/gpu-cloud-pricing the A4000s here, with 20ish steps 512x512 and I think it was close to a 1-2 s IIRC. There are some consumer cards that can get close i'm sure with some tweaking of image size, steps and other SD tuning.


Remember when downloading an MP3 took 20 minutes though?


It's awesome to see how much creativity, progress, and community involvement results from truly open AI development.

Congrats to the stable diffusion team for their openness and inclusiveness!


I downloaded this and tried out a few prompts like, "Mark Twain holding an iPhone", and got back an image of Mark Twain - once in some surrealist nightmare fashion and another more like a 3D render. Neither were holding anything, let alone an iPhone. Cranking up the DDIM slider didn't seem to do much. Trying the same prompt on mage.space (see the creators comment in this thread) produced exactly what I assumed it would.

Is there a trick to it?


Sometimes it's difficult to get certain combinations of things in a picture. It's easier if you provide a basic sketch with the components you want and use img2img (not sure how charl-e has it set up for img2img access, since I use the python original).

There's also a knack for writing the prompts, generally you want to write your prompt as a list of short sentences. Don't make your prompt too short. Use concrete and clear concepts, beginning with your main subject, and describe their relation. You can also qualify the background, the mood, the material and the style among other things. Generate more than one sample - I normally go for five or ten samples so there are better odds of getting one that works, since the AI could try to interpret the prompt in different ways that make sense (but not to a human).

I can't try it right now, but you might try something like: "A man is standing on the street, close up. The man has an iphone in his hand. The man is Mark Twain. Regular city background. Impressionist. Detailed."

Tweak a few times and more often than not you'll end up with something satisfying.


You need to be more specific or you're gonna get a grab bag of 3D renders, stock photos, cartoons, etc.

Like what were you hoping for? and add terms that will drive towards that.


I wrote what I was expecting: The same type of results from the online tool I used.

My comment was pretty clear, not sure why the two responses I got decided to give me advice about prompts.


Because it depends on your online tool, Midjourney for example adds things to your prompts to make sure your images are better, OpenAI adds things to your prompts too.

SD is much more raw.


Anyone have an M1 Ultra they can test this on? My 3080 Ti can render a 512x512 image in something like 7 seconds and I've love to compare against Apple Silicon.


There isn't any optimised diffuser on m1 yet most of them are just running basic MPS graph or even mix of CPU and MPS ops and ofc it is extremely slow I dont have time to test this implementation but author just use some other implementation with simple UI. So I would be surprise if it is faster than 10s per img with 20steps and probably more close to 25-40s and around 40s-1min per classic 512x512 50-60 step setting as are other models.


M1 max 16" with 64gigs ram, lstein fork, about 30-40s.


On M1 Max Mac Studio I'm getting about 45s. On my 3080 Ti about 5-7s.


I had Stable Diffusion running on m1 and intel macbooks vein the first few days, but the original repo would have done people some favors if they either created proper conda lock files for several platforms or just used conda-forge instead of mixing conda and unnecessarily (I think there was one dep which actually wasn’t on conda-forge, besides their own things)

(and actually made the code independent of cuda)


Love the 'we haven't managed to implement the ever so complex version checker logic yet - so give us your e-mail' ruse.

EDIT: I take it back - all the menus are the generic Electron ones, so it is quite possible that the author is finding this part tricky.


When I hit "generate" on my M1 Air with the default options, it just sits saying "initializing...0%" forever. Gave it five minutes, still nothing. Tried twice, same thing.

Is it... doing anything? Do I just need to wait 10 minutes? 20?


Check the logs. It is downloading another 3GB of stuff.


Is it going to download stuff for every different image? If not then it's dumb to not download it in the beginning.


Can someone please explain how I can run this on my computer but something like GPT3 is too computational intensive to do the same? Isn’t text easier than images?


GPT3 is a large model that won't fit in memory generally.

Stable Difussion isn't so heavy... mostly you are limited by how many steps you want to do.


But why does GPT3 need to be so large?


Disclaimer: I’m not well versed in this field, but my basic understanding is that Diffusion can and is “fuzzy”.

An image without “crisp” pixels that create lines is acceptable. A basketball with an extra/missing blue pixel is acceptable.

With words, they either haven’t invented Diffusion yet, or the nature of the problem is too hard for Diffusion.

With text you can’t have the model Diffuse to “teli ne abouf a fdog?” “The dod is graen and has stix leg5”. It’s just too obviously wrong.

Meanwhile, the “fuzziness” allows Diffusion models to be smaller, compared with models that need precision.


Have they fixed it? I installed it in a virgin macOS 12.5 instance on Wednesday and it didn't work at all


Has Stable Diffusion been optimized already so it could run on M1 with 8 GB of RAM without swapping?


I highly doubt it. It Struggles on GPUs with 6GB or less.


How long until on-device stable diffusion on new iOS devices? RAM will be a bottleneck I guess


Is there a reason it won't work on an intel Mac?


People have been focused on getting stable diffusion running well on M1 macs because their graphics systems have so much more horsepower than the Intel macs. The M1s also have a fast memory sharing architecture for graphics, and this needs an absolute minimum of around 8gb of vram — many Intel macs just won’t be able to handle this.


SD on an Intel mac with Vega graphics runs pretty well though — I think it ran at something like ~3-5 iterations/s for me, which is decent. I ran either https://github.com/magnusviri/stable-diffusion or https://github.com/lstein/stable-diffusion which have MPS support


That's good to know as I just got a good deal on one and was wondering if the AMD GPU would be useful or if I needed to start planning for an eGPU with some NVidia silicon. Thanks!


From the website:

> Will this be available on Intel Macs?

> Yep, I'm working on making it compatible with older Macs.


Clickbait. I clicked on this link. Nothing happened.


I have a mac a few years old, and now we start seeing M1 only software. My next computer won’t be a mac.


Then it is unlikely that either your current or your next computer will be able to run M1-only software. What problem are you trying to solve?


You're starting to see new, specialised software that can take advantage of the M1 specifically, use the M1




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: