Convert PyTorch tensor to CuPy array without detaching graph?


I have a PyTorch tensor that is on a CUDA device and has a computation graph (it is built from another PyTorch CUDA tensor).

I want to convert this to a CuPy array without losing the computation graph. Is this possible at all?

When I try to convert the PyTorch tensor to CuPy array without using detach first, I get this error.
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

For reference, these are CuPy docs which recommend using DLPack to do this conversion (which loses the computation graph).

Hello @ptrblck. If possible, could you please help me with this? I would really appreciate the help. Thank you.

I think using DLPack sounds like the right approach, but I would also assume you would need to write a custom autograd.Function including the backward method as described here as Autograd would not be aware of CuPy operations.

Thank you for the reply. I agree with the custom autograd function idea, we are indeed using something like that in our code. But since CuPy does not retain the graph of the PyTorch tensor, once we perform some operations in CuPy and convert back to PyTorch, we are unable to differentiate w.r.t. a PyTorch tensor that was used in creating PyTorch tensor that was converted to CuPy.

x = torch.rand(10, 1).requires_grad_(True).to("cuda")
y = x ** 2

cupy_y = cupy.from_dlpack(y)

# perform some operations on cupy_y to get new_cupy_y

y_torch = torch.from_dlpack(new_cupy_y.toDlpack())

# this gives an error because y_torch is not made from x
torch.autograd.grad(y_torch, x, grad_outputs=torch.ones_like(y_torch), create_graph=True)

I think I will have to use PyTorch’s C++ autograd kernels in this case, where this problem shouldn’t be.

No, I don’t think you need to implement C++ Autograd kernels, but should be able to write a custom autograd.Function in Python defining the forward and backward methods as described in the link in my previous post.

CuPy interoperability docs has a full working example of torch.autograd.Function implemented in CuPy :smiley:

This example uses raw CUDA kernel code, but the principle is the same when using CuPy’s builtin functions. Note that although x.detach() is called inside forward(), a computational graph will be constructed between inputs and outputs when running y = CuPyLog.apply(x).


Thank you for your replies. Giving this more thought, I have an idea to retry this by using the custom autograd.Function autograd kernel. Right now, I was unable to return the derivative w.r.t the CuPy matrix input to the forward method, but I can do it if I take as input a PyTorch tensor and use CuPy in my forward method regardless.