I created an autograd functions, and in the forward / backward, I did numpy operations, and the function is returning cpu tensors.
This is not compatible with the rest of the code, which is using cuda tensors.
I wondered what was best : Convert all my operations to cuda tensors inside the autograd function, or convert the cpu tensor to a cuda tensor after it was returned by my autograd function ?
Short answer is try both and keep the fastest
Longer answer is that moving stuff from and to the gpu is quite expensive.
So if your other operations are really large and benefit a lot from the gpu, then it is worth doing them on the gpu even though you pay the transfert. If your other operations are quite cheap, then the transfer is going to be more expensive than doing the operation directly on the cpu, and so you should just do everything on the cpu.