Oh that's cool! Did you ever think about recreating the DOM in SVG elements in animating those with CSS, instead of using raster images? It would potentially be possible with MutationObserver, and optionally "copying" any CSS animations in the source DOM. Will likely not be perfect but could still be a good trade-off for a little demo animation of e.g. how to fill out a certain form instead of using a GIF or video.
I'm building this to use to record Electron window recordings - I ultimately want to use this to automatically build animated SVGs showcasing features of VS Code extensions as a part of their CI so the showcased can be coded using VS Code API and the animations in the extension readme just fall out of that. So no DOM in this case. But I am thinking of ways to optimize this. I know this will be used for screen recordings and that gives me some flexibility in what assumptions I can make in the optimization pipeline. At the moment I plan on trying to come up with an algorithm to detect scrolling/jumping regions (like in the demo where the whole line height grows once the first emoji is typed on a given line) and animate their scrolling and cropping using CSS animations instead of image patches. But I think what you describe is definitely something to explore and I might take that up next if I figure a use case for it as it sounds really interesting and fun!
https://github.com/TomasHubelbauer/SVG-Screencast
This is for screen recordings. SVG animations as the OP I play around with, too:
https://github.com/TomasHubelbauer/SVG-3D
:)