Because "avatar" isn't specific enough - some creators use fully dynamic rigged models with motion controls (those got a name of VTubers, because they started as Virtual YouTubers, see: Kizuna Ai), some use static images (so your traditional avatars, often being commissioned artwork of their OCs) to just fill the video feed while discussing topics with no relevant footage to show (often seen with content creators covering animation). PNG-tubers, being dynamically controlled static images, are in the middle of the two, and the term appeared somewhat naturally as a result. It has a very clear definition in the area it's being used in so I wouldn't call it non-descriptive honestly