By the way, what's the compute power difference between an A100 and a 4090?
So for this specific application it really depends on where the bottleneck is
By the way, what's the compute power difference between an A100 and a 4090?