Should it really be necessary to do var.detach().cpu().numpy()?

I have a CUDA variable that is part of a differentiable computational graph. I want to read out its value into numpy (say for plotting).

If I do var.numpy() I get RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Ok, so I do var.detach().numpy() and get TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first

Ok, so I go var.detach().cpu().numpy() and it works.

My question is: Is there any good reason why this isn’t just done within the numpy() method itself? It’s cumbersome and litters the code to have all these *.detach().cpu().numpy()'s sitting all around.

1 Like

I have the same question. When a user calls numpy() on a variable, I think he / she must also wants that variable on cpu and is detached.
Don’t know how the PyTorch guys think, but i think there should be a function to get the inner values of a tensor.

Hi,

The main reason behind this choice I think is to avoid confusing new comers. People not very familiar with requires_grad and cpu/gpu Tensors might go back and forth with numpy. For example doing pytorch -> numpy -> pytorch and backward on the last Tensor. This will backward without issue but not all the way to the first part of the code and won’t raise any error.
So the choice has been made to force the user to detach() to make sure they want to do it and it’s not a typo/other library that does this tranformation and breaks the computational graph.

Fair enough - but could we at least get rid of the need for X.cpu().numpy()? Seems X.numpy() alone should be enough.

The reason for requiring explicit .cpu() is that CPU tensors and the converted numpy arrays share memory. If a .cpu() is implicitly done, the operation will be different for CUDA and CPU tensors, and we wanted to be explicit to avoid bugs.