I am a big fan of hugging face, but whenever these examples come up, they always seem to be half baked and don’t quite work well, and are not compelling. I think it would behoove them to polish these a little more, in my opinion.
I think that's a feature of GANS more than anything else. Most papers are cherry picking results to a pretty extreme degree so that they look visually impressive but here you're seeing raw results.
I am super familiar with the generative modeling space, have designed them and deployed to real prod, and I will say you’re right that generative modeling results can be hit or miss. But, I was more referring to the page layout, its unreliability (seemed like the author was live debugging in this thread?), poor UX (seems like people can’t easily figure out the page), etc. rather than anything about the model itself (though tbh I also don’t find Pokémon GAN Number 10,000 compelling as a topic). Fundamentally though you do have to always consider results from the latest papers with some baseline skepticism.
Could you give examples of using GAN in prod? Recently there was a discussion on Twitter about the reduced number of GAN papers in the last few years, and some people mentioned that they did not find adoption in industry.
Sure, my team and I created “style transfer” GANs to help people better understand particular data in light of other data (the latter of which is usually easier to understand but not always available in practice). It ended up getting strongly positive feedback from large stakeholders, we secured a large contract to deploy, and deployed / maintain it as a SaaS. I even got a patent for the work! I’m sorry I can’t be too much more specific. But, it’s also partially why I may come across as slightly annoyed with the presentation here — HuggingFace is 100% something with functionality I would have preferred to leverage versus my team needing to handle modeling, training, building, releasing, deploying, SLAs, etc. And I would love to support them. But all the presentations being rudimentary with poor UX makes it difficult for me to use them to convince people of this fact, so it’s harder for me to get buy in from finance than it might otherwise be if HF released polished, well thought out demos.
As an aside. I do not personally feel the future of generative modeling is in generative art or creating new Pokémon or things like that, categories which broadly seem like neat tricks without real world usefulness or at least adoption.
That's awesome. If I may ask, the data you operate on are still image-like grids or do you operate on more basic data types (e.g. strings)?
Personally I'm also working on an industrial application, using a CycleGAN-based system to augment real world data (e.g. training a network to "paint" an object so we can apply traditional computer vision techniques such as a HSV filter to locate the object). It's quite promising for this kind of application, albeit hard to fine-tune.
This is image based, not text based. It’s very useful for a number of applications!
I think your usecase is extremely promising assuming it results in better quality output than just running a modern object detector. Another usecase I don’t have bandwidth for, but would likely be very marketable, is similar to what you’re saying but to allow the use of traditional algos like sift or surf across modalities.
I believe this was made by a community member, huggingface allows people to publish to their model repository to foster more open sharing of work in the ML space. You should look at this as in progress research and not a polished thing, and we should be grateful the space is so open.
Edit: Double checking it looks like they are affiliated with HF but it still is more of an open side project and anyone can contribute to the repository so quality will vary.
The most interesting part I think is how I can't really describe why the results are bad. The colors, art style, silhouettes, etc are all exactly right. But they are just incoherent on a deeper level. Pokemon designs don't all match to living or realistic animals, but they largely do. When I look at these and imagine we made it real, I can't picture it actually being able to live while most pokemon do satisfy this test.
I think this is why its failing. The test and criteria that make it more realistic are too hard for the GAN to work out.
That's my belief about why this (sort of) works where all the previous Pokemon-specific GANs fail so completely: it injects knowledge from ImageNet using a pretrained classifier. ImageNet has a decent number of animals, so some of the knowledge carries over. That makes them more coherent. But it's still going to fall short of the full distribution of animal images out there that SOTA stuff like ruDALL-E-*, GLIDE, ERNIE-ViLG, OFA, or Exaone train on.
Just rebuilt the app with more examples, so the queue should stay a lot shorter now. This was built using Gradio[0] and is hosted on Spaces[1]. Check out the paper[2] and github repo[3].
They are the numbers at the bottom left of the interface under the 'clear' button. Try clicking them and it will auto fill the input then you just click submit. Or what do you see?
That used a finetuned ruDALL-E which is not based on a GAN architecture and is much, much slower than a GAN, albeit the generated Pokemon are more coherent. You can play with it (with no queue) on your own Colab Notebook here: https://colab.research.google.com/drive/1A3t2gQofQGeXo5z1BAr...
Unbelievable how clean looking and artifact-free these neural net generated images have become (in a short short time!) compared to eg. this deconvolved monstrosity
Traditional GANs typically have a different problem called mode collapse, in which the generator learns one (or a few) outputs that the discriminator likes. Conditional GANs are given a condition as input and have more chance of overfitting, as the loss function is a combination of the GAN output and a supervised loss based on the condition.