Fft performance

no, torch doesn’t do float16 calculations during rfft when your tensor is float32.
the way they pad and stuff might be different between libraries, which might explain the differences.