.detach() vs .cpu()?

Which one is a better practice to detach a tensor without expanding CPU memory too much? x.detach() or x.cpu() ?
I found out that if I perform x.cpu(), my memory expand quickly and soon reach memory limit error in slurm.

1 Like


The two have very different (and non-overlapping) effect:

  • x.cpu() will do nothing at all if your Tensor is already on the cpu and otherwise create a new Tensor on the cpu with the same content as x. Note that his op is differentiable and gradient will flow back towards x!
  • y = x.detach() breaks the graph between x and y. But y will actually be a view into x and share memory with it.