Which one is a better practice to detach a tensor without expanding CPU memory too much? x.detach() or x.cpu() ?
I found out that if I perform x.cpu()
, my memory expand quickly and soon reach memory limit error in slurm.
1 Like
Hi,
The two have very different (and non-overlapping) effect:
-
x.cpu()
will do nothing at all if your Tensor is already on the cpu and otherwise create a new Tensor on the cpu with the same content asx
. Note that his op is differentiable and gradient will flow back towardsx
! -
y = x.detach()
breaks the graph betweenx
andy
. Buty
will actually be a view intox
and share memory with it.
2 Likes