Should it really be necessary to do var.detach().cpu().numpy()?

petered · January 24, 2019, 11:29am

I have a CUDA variable that is part of a differentiable computational graph. I want to read out its value into numpy (say for plotting).

If I do var.numpy() I get RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Ok, so I do var.detach().numpy() and get TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first

Ok, so I go var.detach().cpu().numpy() and it works.

My question is: Is there any good reason why this isn’t just done within the numpy() method itself? It’s cumbersome and litters the code to have all these *.detach().cpu().numpy()'s sitting all around.

chenglu · January 24, 2019, 12:54pm

I have the same question. When a user calls numpy() on a variable, I think he / she must also wants that variable on cpu and is detached.
Don’t know how the PyTorch guys think, but i think there should be a function to get the inner values of a tensor.

albanD · January 24, 2019, 12:58pm

Hi,

The main reason behind this choice I think is to avoid confusing new comers. People not very familiar with requires_grad and cpu/gpu Tensors might go back and forth with numpy. For example doing pytorch -> numpy -> pytorch and backward on the last Tensor. This will backward without issue but not all the way to the first part of the code and won’t raise any error.
So the choice has been made to force the user to detach() to make sure they want to do it and it’s not a typo/other library that does this tranformation and breaks the computational graph.

petered · July 8, 2019, 6:02pm

Fair enough - but could we at least get rid of the need for X.cpu().numpy()? Seems X.numpy() alone should be enough.

SimonW · July 9, 2019, 2:12am

The reason for requiring explicit .cpu() is that CPU tensors and the converted numpy arrays share memory. If a .cpu() is implicitly done, the operation will be different for CUDA and CPU tensors, and we wanted to be explicit to avoid bugs.

David_Knowles · August 18, 2019, 9:32pm

The explicitness of pytorch is most of what I’m enjoying relative to tensorflow2.

As a quick follow-up to this question, is there any difference between var.detach().cpu() and var.cpu().detach() ?

SimonW · August 19, 2019, 2:02pm

If var requires gradient, then var.cpu().detach() constructs the .cpu autograd edge, which soon gets destructed since the result is not stored. var.detach().cpu() does not do this. However, this is very fast so virtually they are the same.

xianqian · April 16, 2020, 11:29am

Ok but why wouldn’t it do backward “all the way to the first part of the code”?

albanD · April 16, 2020, 2:03pm

Because we are only able to provide gradients for pytorch ops, not other ops (like numpy or other libraries).

pytorching · September 22, 2020, 5:31am

I’m a little confused. var.clone() will keep both tensors in computation graph, var.cpu() will return a copy of this tensor not on cuda but on cpu right? So will var and var.cpu() both remain in the computation graph? Will they share memory?

pytorching · September 22, 2020, 5:41am

var(on cuda) and var.cpu() are on different devices, they cannot share memory. So var and var.cpu() will act like var and var.clone() except that they are on different devices?

pytorching · September 22, 2020, 5:42am

And If I use var and var.cpu().cuda() , it will act like var and var.clone()?

albanD · September 22, 2020, 2:01pm

And If I use var and var.cpu().cuda() , it will act like var and var.clone() ?

If the var was originally on the GPU yes.

So var and var.cpu() will act like var and var.clone() except that they are on different devices?

Yes.

akh · September 18, 2023, 6:38pm

https://pytorch.org/docs/stable/community/design.html#principle-2-simple-over-easy