Hacker News new | past | comments | ask | show | jobs | submit login
Pokemon GAN (huggingface.co)
77 points by aliabd on Feb 14, 2022 | hide | past | favorite | 30 comments



I am a big fan of hugging face, but whenever these examples come up, they always seem to be half baked and don’t quite work well, and are not compelling. I think it would behoove them to polish these a little more, in my opinion.


I think that's a feature of GANS more than anything else. Most papers are cherry picking results to a pretty extreme degree so that they look visually impressive but here you're seeing raw results.


I am super familiar with the generative modeling space, have designed them and deployed to real prod, and I will say you’re right that generative modeling results can be hit or miss. But, I was more referring to the page layout, its unreliability (seemed like the author was live debugging in this thread?), poor UX (seems like people can’t easily figure out the page), etc. rather than anything about the model itself (though tbh I also don’t find Pokémon GAN Number 10,000 compelling as a topic). Fundamentally though you do have to always consider results from the latest papers with some baseline skepticism.


Could you give examples of using GAN in prod? Recently there was a discussion on Twitter about the reduced number of GAN papers in the last few years, and some people mentioned that they did not find adoption in industry.


Sure, my team and I created “style transfer” GANs to help people better understand particular data in light of other data (the latter of which is usually easier to understand but not always available in practice). It ended up getting strongly positive feedback from large stakeholders, we secured a large contract to deploy, and deployed / maintain it as a SaaS. I even got a patent for the work! I’m sorry I can’t be too much more specific. But, it’s also partially why I may come across as slightly annoyed with the presentation here — HuggingFace is 100% something with functionality I would have preferred to leverage versus my team needing to handle modeling, training, building, releasing, deploying, SLAs, etc. And I would love to support them. But all the presentations being rudimentary with poor UX makes it difficult for me to use them to convince people of this fact, so it’s harder for me to get buy in from finance than it might otherwise be if HF released polished, well thought out demos.

As an aside. I do not personally feel the future of generative modeling is in generative art or creating new Pokémon or things like that, categories which broadly seem like neat tricks without real world usefulness or at least adoption.


That's awesome. If I may ask, the data you operate on are still image-like grids or do you operate on more basic data types (e.g. strings)?

Personally I'm also working on an industrial application, using a CycleGAN-based system to augment real world data (e.g. training a network to "paint" an object so we can apply traditional computer vision techniques such as a HSV filter to locate the object). It's quite promising for this kind of application, albeit hard to fine-tune.


This is image based, not text based. It’s very useful for a number of applications!

I think your usecase is extremely promising assuming it results in better quality output than just running a modern object detector. Another usecase I don’t have bandwidth for, but would likely be very marketable, is similar to what you’re saying but to allow the use of traditional algos like sift or surf across modalities.


I believe this was made by a community member, huggingface allows people to publish to their model repository to foster more open sharing of work in the ML space. You should look at this as in progress research and not a polished thing, and we should be grateful the space is so open.

Edit: Double checking it looks like they are affiliated with HF but it still is more of an open side project and anyone can contribute to the repository so quality will vary.


The most interesting part I think is how I can't really describe why the results are bad. The colors, art style, silhouettes, etc are all exactly right. But they are just incoherent on a deeper level. Pokemon designs don't all match to living or realistic animals, but they largely do. When I look at these and imagine we made it real, I can't picture it actually being able to live while most pokemon do satisfy this test.

I think this is why its failing. The test and criteria that make it more realistic are too hard for the GAN to work out.


That's my belief about why this (sort of) works where all the previous Pokemon-specific GANs fail so completely: it injects knowledge from ImageNet using a pretrained classifier. ImageNet has a decent number of animals, so some of the knowledge carries over. That makes them more coherent. But it's still going to fall short of the full distribution of animal images out there that SOTA stuff like ruDALL-E-*, GLIDE, ERNIE-ViLG, OFA, or Exaone train on.


These types of demos are typically intended for quick interactive open-source proof-of-concepts rather than an e2e app.


Ah that makes sense. Still, feels clunky and not compelling imo, but I may not be the target audience.


Seems fine to me. It's a demo.


Just rebuilt the app with more examples, so the queue should stay a lot shorter now. This was built using Gradio[0] and is hosted on Spaces[1]. Check out the paper[2] and github repo[3].

[0]: https://github.com/gradio-app/gradio [1]: https://huggingface.co/spaces [2]: http://www.cvlibs.net/publications/Sauer2021NEURIPS.pdf [3]: https://github.com/autonomousvision/projected_gan


The examples aren't displaying for me.


They are the numbers at the bottom left of the interface under the 'clear' button. Try clicking them and it will auto fill the input then you just click submit. Or what do you see?


I see the numbers, but nothing appears when they're selected. If I generate a fresh one it shows up fine. iOS.


For context, this is not the same technique that was used to create the AI-generated Pokemon a couple months ago: https://www.reddit.com/r/pokemon/comments/rgmyxp/i_trained_a...

That used a finetuned ruDALL-E which is not based on a GAN architecture and is much, much slower than a GAN, albeit the generated Pokemon are more coherent. You can play with it (with no queue) on your own Colab Notebook here: https://colab.research.google.com/drive/1A3t2gQofQGeXo5z1BAr...


Very nice!

Unbelievable how clean looking and artifact-free these neural net generated images have become (in a short short time!) compared to eg. this deconvolved monstrosity

https://www.christies.com/media-library/images/features/arti...


Browsing through their code, it looks like they are using a VQGAN for image generation and ESRGAN for super resolution.


These are wonderfully low-effort and I love each and every one of them.


Some examples look like they have picked two pokemon and tried to merge them in a very weird way. I'll list what I see so others can compare:

- 0 as Arceus + Shaymin S

- 1 as Lickilicky + Espurr

- 10 as Kirlia + Porygon 2, don't know where the colors are from

- 20 as Shellos + Froslass

- 30 as Roggenrolla/Oddish/Poliwag

- 42 as Garchomp + Trevenant

- I see the Kyurem in 60 but no idea where the tail is from

- 102 as Dragonite + Hitmonlee


The model appears to have overfit a bit (seed 535 is blatantly Vivillion: https://twitter.com/ak92501/status/1492560750103121920 )

Which is interesting because GANs tend not to overfit.


Traditional GANs typically have a different problem called mode collapse, in which the generator learns one (or a few) outputs that the discriminator likes. Conditional GANs are given a condition as input and have more chance of overfitting, as the loss function is a combination of the GAN output and a supervised loss based on the condition.


Vivillon just has 20 different forms, so it makes sense it would be more common.


0 is closer to Sivally than Arceus.

Some manual seeds:

4552 is a corrupted Talonflame

2518 is another Vivillion, and so is 3330, and so is 2216, and so is 3373, and so is 3151 (looks like all Vivillion forms were put in)

3129 is horrific, it's like Glaceon has been stretched into a Wurmple and put on top of a Dracovish head and an unknown body

1742 is a mutilated Slurpuff

2410 is a Tapu mixed with... Corsola-G?

3805 is obviously Magmortar with some dragon head

3546 is clearly Reshiram (with genie arms?), and 1878 is mega Tyranitar (both down to the colours)

2194 is Gible x Joltik

1576 is radioactive Alcremie

2000 is straight up Roselia


>0 is closer to Sivally than Arceus.

Well, that makes sense with the lore


Just updated the link again - much faster now (inference down to 0.5s from 9s) Queue will move a lot quicker now.


This makes me want to build a Twitter Bot "Cursed Pocket Monsters"...


How long before game freak uses something like this to create new Pokemon?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: