.detach() vs .cpu()?

antran96 · October 20, 2020, 9:25am

Which one is a better practice to detach a tensor without expanding CPU memory too much? x.detach() or x.cpu() ?
I found out that if I perform x.cpu(), my memory expand quickly and soon reach memory limit error in slurm.

albanD · October 20, 2020, 3:35pm

Hi,

The two have very different (and non-overlapping) effect:

x.cpu() will do nothing at all if your Tensor is already on the cpu and otherwise create a new Tensor on the cpu with the same content as x. Note that his op is differentiable and gradient will flow back towards x!
y = x.detach() breaks the graph between x and y. But y will actually be a view into x and share memory with it.