I have written a (free OSS) Linux tool called `psn` for this kind of stuff [1]. It samples each interesting thread's state from /proc/PID/task/TID/status,wchan,syscall etc and shows you a summary of which threads were blocked in which state, in which syscalls and where in the kernel were they stuck (wchan). It can be used with applications like pv, tar, dd, mysqld, httpd, etc.
Having maintained an open source library, it’s actually really helpful to see features people want. Not everyone needs to contribute directly to the code base. User feedback is valuable, too.
Ie. within pv, is it the reading the input stream or the writing the output stream that is blocking most of the time?