Perform preprocessing steps on GPU or CPU

Is it generally advisable to perform as many steps on entities which have been previously moved to the GPU instead of doing them on the CPU?

Specifically I am talking about FFT and postprocessing steps like torch.roll (to mimic numpys fftshift) and taking absolute values and general multiplications and stuff.

Or should one first pull them down on the CPU and perform those operations there?

Are there general guidelines to this?

thanks in advance