Could you check if detaching the tensor if you are not using the computation graph or calling backward before calling copy_
(similar to as described here `copy_` operations get repeated in autograd computation graph) helps?
Alternatively, if you know that gradients are not needed anywhere, you could try using the no_grad
guard as well:
Typedef torch::NoGradGuard — PyTorch master documentation
1 Like