I have a CUDA variable that is part of a differentiable computational graph. I want to read out its value into numpy (say for plotting).

If I do `var.numpy()`

I get `RuntimeError: Can’t call numpy() on Variable that requires grad. Use var.detach().numpy() instead.`

Ok, so I do `var.detach().numpy()`

and get `TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first`

Ok, so I go `var.detach().cpu().numpy()`

and it works.

My question is: Is there any good reason why this isn’t just done within the `numpy()`

method itself? It’s cumbersome and litters the code to have all these `*.detach().cpu().numpy()`

's sitting all around.