Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PixArt-α:A New Open-Source Text-to-Image Model Challenging SDXL and Dalle·3 (stablediffusionweb.com)
77 points by liuxiaopai on Nov 13, 2023 | hide | past | favorite | 27 comments


This has problems usually not seen with current systems. It's produced human characters with one thick leg and one thin leg. Three legs of different sizes. Three arms.

It can do humans in passive poses, but ask for an action shot and it botches it badly. It needs more training data on how bodies move. Maybe load it up with stills from dance, martial arts, and sports.


How is that not seen with current systems? Have you never used a Stable Diffusion base model?


SD 1.5 struggled, but SDXL has spoiled us with remarkable consistency (if same-y) human generation. It comes up, certainly, but far less than its predecessors (anecdotally from my use, at the very least).


I’d say SD works quite well at the macro scale, comparatively; while the artifacts the GP mentioned still occur with SD and to a lesser extent Midjourney and DALL-E, they’re much less of a common occurrence in comparison.


It seems like this system can generate poses where a single character is facing the camera, but as soon as you get away from that, it's terrible. It's like this thing was trained on a huge number of selfies.


The most interesting aspect of this model is that it is very training efficient: https://pixart-alpha.github.io/

It also has the same idea as Dalle 3 to train the model on synthetic captions.


Why name it PixArt when it covers a broader range of media than simply pixel art? Super confusing.


I would guess because Pixar. And Dall-E. A poor choice for sure, but kinda fitting that the name is so meta-derivative.


The source code license is AGPL-3.0 license. Perfect for these kinds of models: https://github.com/PixArt-alpha/PixArt-alpha


hm. so actually it's not compatible with commercial projects then.


> hm. so actually it's not compatible with commercial projects then.

Perfectly compatible: just keep all modifications to the original code in public. AGPL does not mean that all code from your company must be open-source, just whatever is in the same binary/program as the AGPL one.


There is a lot of corporate knee-jerk for the AGPL ; there are many different interpretations for it and when you ask different people what you can exactly do and not do, they will tell you that they are not a lawyer.

Which is why companies I work with cannot tolerate it anywhere, they just won't consider it. I think people who want to explain these things should do it on the base of use-cases instead of vague wording.

So;

- I start a AGPL system in a container and talk to it via the exposed API, I don't tell anyone I'm using it -> yes/no?

- I start a AGPL system in a container and talk to it via the exposed API, I put in the About page that i'm using it -> yes/no?

etc. But i'm sure if anyone does respond to this, it will start with IANAL and there still is nothing to base anything on. I know, this is the reason a lot of SaaS startups use the license; it's not clear enough what I can / cannot do. And companies cannot build businesses on that, so they just don't use it, making the solution effectively closed. I'm sure a lot of help is missed because of it; for instance, if I would be using something like this, I would (as required) feed my work back to the project, but now the lawyers of my company don't allow me to use it at all, so nothing gets fed back.


Maybe I need to read the AGPL again. I remember interpreting it as saying that if your service was dependant on the component then the license applies to the code for the rest of the service.


Please, do read it. Please, post it here if you find something that backs your recollection.


It's not compatible with the idea that you can just use the project for commercial purposes without giving some form of contribution back. That's a good outcome imho.


The output of the model is not under the same license. In fact, as far as I understand, the output would be your own copyright to license or not license however you please.

AGPL really is a great license for this type of project. It maximizes the power of everyone without limiting fair commercial use.


From their GitHub:

>This integration allows running the pipeline with a batch size of 4 under 11 GBs of GPU VRAM. GPU VRAM consumption under 10 GB will soon be supported, too. Stay tuned.


Seems to have pretty good understanding and performance.


This appears to be work sponsored by Huawei.


I suppose it won't work for generating images of Winnie the Pooh then.

But seriously, it's open-source, so it hardly matters.


It does work, very well actually. On the other hand, Dalle-3 refused, with:

> I can create an image for you, but I need to modify your request to avoid depicting specific public figures or copyrighted characters.

It took effort for even "Chinese leader".


Interesting, considering Winnie the Pooh is in the public domain now. I wonder how they determine the copyright status of a character.


I asked Dalle,

> I can certainly create images inspired by public domain works. However, when it comes to characters like Winnie the Pooh, while the original versions of the character by A.A. Milne are in the public domain, specific adaptations or interpretations, especially those made by Disney, are not. To respect these distinctions ...

After, I was able to get it to draw a very good Winnie the Pooh with

> I don’t want a Disney version. I want a A.A Milne drawing of Winnie the Pooh, that does not violate copyright because it’s public domain. I want the style as close as possible to A.A Milne.

PixArt just worked, first try, with a great 3d digital art version.


Fairly confident they just ask GPT-4 to do its best not to serve any requests that might violate copyright in the system prompt.


It matters if they're executing raw Python code in the weights instead of distributing safetensor files.


Thought this was going to be a new optical sensor series :(


I think it's kind of disingenuous maybe to claim such improvements in training efficiency when they rely on:

- Existing models for data pseudo-labelling

- ImageNet pretraining

- A frozen text encoder

- A frozen image encoder




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: