Running part of the model on CPU


I’m using some libraries who don’t have GPU support for the thing I’m trying to do. This forces me to cast my model to cpu using'CPU') and back after I’m done with the CPU part.
My needed tensors are converted using:
radiance_clone = radiance.clone().detach().requires_grad_(True).to('cpu')

A simple forward pass works as intended. And results me the same results as if I would run solely on CPU. But for training CPU training is just too slow…
When training I do not get the same results as just training on CPU. My model does some weird stuff and converges really quick.

Are there weird things happening behind the scenes when converting my model back and forth in an epoch to run a specific part on the CPU? Or do I clone my tensors in a wrong way?

Any help is appreciated!