Would make sense I suppose if I was using two different GPUs for the same thing and get two different outcomes. But instead I have two implementations (one naive, one tensor cores) running on the same GPU, but getting different outcomes, where they should be the same.
But then this joke might be flying above my head as well.
But then this joke might be flying above my head as well.