I call bs on training a RL agent to literally output strokes. The way each image renders is a dead give away that this is just using a text to image model, then convert it to svg, and finally animate the svg paths. They might even bypass the svg conversions with clever mask reveals. I was able to achieve the same thing in about 5 mins. https://giphy.com/gifs/rFVxSxZMlflZUX4TqI
Edit: I've used. It's amazing. I'm going to be using this a lot.