I can't tell if you're disparaging the usage or not (truly, I can't tell), but such utterances exist because they serve a real function. Disfluency is an integral part of speech.
I think it's a good idea, if done well. It could also potentially be combined with dynamically adjusting speed of the speech, and reducing or increasing the use of shortcuts and contractions, making word replacements.
I know wish for a model built to be a low-computation filter that takes text in and produces padded text out intended for TTS and annotated with pauses or sounds and extra words that maintains the same meaning but provides the ability to dynamically adjust the level of verbosity to maintain a fixed rate of words per minute.