PixArt-α:A New Open-Source Text-to-Image Model Challenging SDXL and Dalle·3

Animats · on Nov 14, 2023

This has problems usually not seen with current systems. It's produced human characters with one thick leg and one thin leg. Three legs of different sizes. Three arms.

It can do humans in passive poses, but ask for an action shot and it botches it badly. It needs more training data on how bodies move. Maybe load it up with stills from dance, martial arts, and sports.

ilaksh · on Nov 14, 2023

How is that not seen with current systems? Have you never used a Stable Diffusion base model?

NBJack · on Nov 14, 2023

SD 1.5 struggled, but SDXL has spoiled us with remarkable consistency (if same-y) human generation. It comes up, certainly, but far less than its predecessors (anecdotally from my use, at the very least).

Philpax · on Nov 14, 2023

I’d say SD works quite well at the macro scale, comparatively; while the artifacts the GP mentioned still occur with SD and to a lesser extent Midjourney and DALL-E, they’re much less of a common occurrence in comparison.

Animats · on Nov 14, 2023

It seems like this system can generate poses where a single character is facing the camera, but as soon as you get away from that, it's terrible. It's like this thing was trained on a huge number of selfies.

GaggiX · on Nov 13, 2023

The most interesting aspect of this model is that it is very training efficient: https://pixart-alpha.github.io/

It also has the same idea as Dalle 3 to train the model on synthetic captions.

ShamelessC · on Nov 14, 2023

Why name it PixArt when it covers a broader range of media than simply pixel art? Super confusing.

ilkke · on Nov 14, 2023

I would guess because Pixar. And Dall-E. A poor choice for sure, but kinda fitting that the name is so meta-derivative.

krasin · on Nov 14, 2023

The source code license is AGPL-3.0 license. Perfect for these kinds of models: https://github.com/PixArt-alpha/PixArt-alpha

ilaksh · on Nov 14, 2023

hm. so actually it's not compatible with commercial projects then.

krasin · on Nov 14, 2023

> hm. so actually it's not compatible with commercial projects then.

Perfectly compatible: just keep all modifications to the original code in public. AGPL does not mean that all code from your company must be open-source, just whatever is in the same binary/program as the AGPL one.

anonzzzies · on Nov 14, 2023

There is a lot of corporate knee-jerk for the AGPL ; there are many different interpretations for it and when you ask different people what you can exactly do and not do, they will tell you that they are not a lawyer.

Which is why companies I work with cannot tolerate it anywhere, they just won't consider it. I think people who want to explain these things should do it on the base of use-cases instead of vague wording.

So;

- I start a AGPL system in a container and talk to it via the exposed API, I don't tell anyone I'm using it -> yes/no?

- I start a AGPL system in a container and talk to it via the exposed API, I put in the About page that i'm using it -> yes/no?

etc. But i'm sure if anyone does respond to this, it will start with IANAL and there still is nothing to base anything on. I know, this is the reason a lot of SaaS startups use the license; it's not clear enough what I can / cannot do. And companies cannot build businesses on that, so they just don't use it, making the solution effectively closed. I'm sure a lot of help is missed because of it; for instance, if I would be using something like this, I would (as required) feed my work back to the project, but now the lawyers of my company don't allow me to use it at all, so nothing gets fed back.

ilaksh · on Nov 14, 2023

Maybe I need to read the AGPL again. I remember interpreting it as saying that if your service was dependant on the component then the license applies to the code for the rest of the service.

krasin · on Nov 14, 2023

Please, do read it. Please, post it here if you find something that backs your recollection.

chii · on Nov 14, 2023

It's not compatible with the idea that you can just use the project for commercial purposes without giving some form of contribution back. That's a good outcome imho.

Buttons840 · on Nov 14, 2023

The output of the model is not under the same license. In fact, as far as I understand, the output would be your own copyright to license or not license however you please.

AGPL really is a great license for this type of project. It maximizes the power of everyone without limiting fair commercial use.

gigel82 · on Nov 14, 2023

From their GitHub:

>This integration allows running the pipeline with a batch size of 4 under 11 GBs of GPU VRAM. GPU VRAM consumption under 10 GB will soon be supported, too. Stay tuned.

ilaksh · on Nov 13, 2023

Seems to have pretty good understanding and performance.

camdenlock · on Nov 14, 2023

This appears to be work sponsored by Huawei.

Mindless2112 · on Nov 14, 2023

I suppose it won't work for generating images of Winnie the Pooh then.

But seriously, it's open-source, so it hardly matters.

nomel · on Nov 14, 2023

It does work, very well actually. On the other hand, Dalle-3 refused, with:

> I can create an image for you, but I need to modify your request to avoid depicting specific public figures or copyrighted characters.

It took effort for even "Chinese leader".

sp0rk · on Nov 14, 2023

Interesting, considering Winnie the Pooh is in the public domain now. I wonder how they determine the copyright status of a character.

nomel · on Nov 14, 2023

I asked Dalle,

> I can certainly create images inspired by public domain works. However, when it comes to characters like Winnie the Pooh, while the original versions of the character by A.A. Milne are in the public domain, specific adaptations or interpretations, especially those made by Disney, are not. To respect these distinctions ...

After, I was able to get it to draw a very good Winnie the Pooh with

> I don’t want a Disney version. I want a A.A Milne drawing of Winnie the Pooh, that does not violate copyright because it’s public domain. I want the style as close as possible to A.A Milne.

PixArt just worked, first try, with a great 3d digital art version.

ShamelessC · on Nov 14, 2023

Fairly confident they just ask GPT-4 to do its best not to serve any requests that might violate copyright in the system prompt.

slowmovintarget · on Nov 14, 2023

It matters if they're executing raw Python code in the weights instead of distributing safetensor files.

andromeduck · on Nov 14, 2023

Thought this was going to be a new optical sensor series :(

philmitchell47 · on Nov 14, 2023

I think it's kind of disingenuous maybe to claim such improvements in training efficiency when they rely on:

- Existing models for data pseudo-labelling

- ImageNet pretraining

- A frozen text encoder

- A frozen image encoder