It's slightly more general than that, hiding inefficient use of functional units. A lot of times that's totally memory latency causing the inability to keep FUs fed like you say, but i've seen other reasons, like a wide but diverse set of FUs that have trouble applying to every workload.
The classic reason quoted for SMT is to allow the functional units to be fully utilised when there is instruction-to-instruction dependencies - that is, the input of one instruction is the output from the previous instruction. Doing SMT allow you to create one large pool of functional units and share them between multiple threads, hopefully increasing the chances that they will be fully used.