From people hosting image generation models on Stable Horde I've heard that you can pretty severely underclock/undervolt your GPUs and keep them stable, massively reducing heat output and energy cost without losing nearly as much performance. I'm not sure if this transfers into text generation or not, this was from image generation workers that have a few seconds downtime between requests; however it might be worth a bit of research if you happen to be running consumer GPUs.
-----
From TheUnamusedFox, in August:
> 3090 down to ~260-270 watts (from 400) with minimal gen speed impact. Same with a 3080ti. It seems to be more stable with image generation than gaming, at least on my two cards. If I try to game or benchmark with this undervolt it is an instant crash.
From another user:
> this undervolting stuff is pretty sweet.
> undervolted_limits.png [1]
> max_power_limits.png [2]
> this is my before and after.
> a solid 200 watt drop for only 9.2% loss of performance
> not to mention the 30 degree drop in temps
----- From TheUnamusedFox, in August: > 3090 down to ~260-270 watts (from 400) with minimal gen speed impact. Same with a 3080ti. It seems to be more stable with image generation than gaming, at least on my two cards. If I try to game or benchmark with this undervolt it is an instant crash.
From another user:
> this undervolting stuff is pretty sweet. > undervolted_limits.png [1] > max_power_limits.png [2] > this is my before and after. > a solid 200 watt drop for only 9.2% loss of performance > not to mention the 30 degree drop in temps
[1]: https://cdn.discordapp.com/attachments/1143237412663869570/1... [2]: https://cdn.discordapp.com/attachments/1143237412663869570/1...