> Doable with a single 1080ti and a couple of hundred midi files?
A 1080ti would probably require something like several days or a week. It depends on how big the model is... Probably not a big deal. However, a few hundred MIDI files would be pushing it in terms of sample size. If you look at experiments like my GPT-2-small finetuning to make it generate poetry instead of random English text ( https://www.gwern.net/GPT-2 ), it really works best if you are into at least the megabyte range of text. Similarly with StyleGAN, if you want to retrain my anime face StyleGAN on a specific character ( https://www.gwern.net/Faces#transfer-learning ), you want at least a few hundred faces. Below that, you're going to need architectures designed specifically for transfer learning/few-shot learning, which are designed to work in the low _n_ regime. (They exist, but StyleGAN and GPT-2 are not them.)