Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can't you just give it a photo of a dog, and then say "use this dog in this or that scene"?


Yes, the idea works and was explored with dreambooth/textual inversion for image diffusion models.

https://dreambooth.github.io/ https://textual-inversion.github.io/


Both of those are of course out of date and require significant training instead of just feeding it a single image.

InstantID (https://replicate.com/zsxkib/instant-id) fixes that issue.


Dreambooth style training is in no way out of date.

If you just want a face, InstandID/Pulid work - but it’s not going to be very varied. Doing actual training means you can get any perspective, lighting, style, expression, etc - and have the whole body be accurate.


How would that even work? A dog has physical features (legs, nose, eyes, ears, etc.) that they use to interact with the world around them (ground, tree, grass, sounds, etc.). And each one of those things has physical structures that compose senses (nervous system, optic nerves, etc.). There are layers upon layers of intricate complexity that took eons to develop and a single photo cannot encapsulate that level of complexity and density of information. Even a 3D scan can't capture that level of information. There is an implicit understanding of the physical world that helps us make sense of images. For example, a dog with all four paws standing on grass is within the bounds of possibility; a dog with six paws, two of which are on it's head, are outside the bounds of possibility. An image generator doesn't understand that obvious delineation and just approximates likelihood.


A single photo doesn't have to capture all that complexity. It's carried by all those countless dog photos and videos in the training set of the model.


Actually, it does have to capture all of that complexity because it's a photon-based analysis of reality. You cannot take a photo without doing that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: