Custom gradient for an input that is not a tensor

AlexZ · March 3, 2022, 1:12pm

Can I have a custom gradient for an input that is not a tensor?
In other words, I want to get rid of the following error when I pass a function to a self-written torch.autograd.Function

function ... returned a gradient different than None at position 1, but the corresponding forward input was not a Variable

My Use-case

Let me explain my use-case: I have a variety of linear operators that look more or less like this:

class LinearOperator(torch.nn.Module):
    def __init__(self, tensor):
        super().__init__()

        self.tensor = tensor
        
    def forward(self, x):
        """
        For a linear operator A and a tensor x this implements x -> A(x)
        """
        return something( self.tensor, x )

    def rmatvec(self, x):
        """
        This implements the transpose of A applied to a tensor: x -> A^T(x)
        """
        raise somethingTranspose( self.tensor, x )

The functions something and somethingTranspose are implemented using PyTorch so it should be possible to backprop through them.

While I can store the variable tensor with which I can compute the linear mapping, I cannot - due to storage space restrictions - store the entire transformation matrix that represents the linear operator A.

I also have a forward function that takes a linear operator A and a tensor x and produces some result f(A, x).
While I can backpropagate through the function f, I want to implement the gradient differently (i.e. more efficiently) by hand in the backward function.

The code then looks like this:

class MyFunctionF(torch.autograd.Function):

    @staticmethod
    def forward(ctx, A, x):
        ctx.A = A   # Linear Operator
        ctx.x = x   # PyTorch tensor

        result = computeF(A, x)
        # result is a PyTorch tensor

        return result

    @staticmethod
    def backward(ctx, grad_output):
        grad_A = something_A(ctx.A, ctx.x, grad_output)
        grad_x = something_x(ctx.A, ctx.x, grad_output)

        return grad_A, grad_x

I am using these classes in the following way:

# Both tensor and x are PyTorch Tensors
tensor = ...
x = ...

linOp = LinearOperator(tensor)
res = MyFunctionF.apply(linOp, x)

linOp1 = OtherLinearOperator(tensor)
res1 = MyFunctionF.apply(linOp1, x)

linOp2 = NextLinearOperator(tensor)
res2 = MyFunctionF.apply(linOp2, x)

Here I have a multiple different classes of linear operators (LinearOperator, OtherLinearOperator, NextLinearOperator, …).

The forward pass works fine, but during the backward pass I am getting the error
function MyFunctionFBackward returned a gradient different than None at position 1, but the corresponding forward input was not a Variable

I guess the problem is that grad_A in the backward function is a tensor, but the input A in forward is not a tensor. Is this correct?

Workaround

A workaround is to construct the linear operator in the forward function of MyFunctionF, but this means that

I have to implement the function f for every type of linear operator.
I have to backpropagate through the forward method of the linear operator as well by hand, although the gradient of the Linear Operator forward method can be computed with backprop.

class MyFunctionF(torch.autograd.Function):

    @staticmethod
    def forward(ctx, tensor, x):
        A = LinearOperator(tensor)
        ctx.A = A
        ctx.x = x

        result = computeF(A, x)

        return result

    @staticmethod
    def backward(ctx, grad_output):
        grad_tensor = something_tensor(ctx.A, ctx.x, grad_output)
        grad_x = something_x(ctx.A, ctx.x, grad_output)

        return grad_tensor, grad_x

Therefore my question:
Is there a different way to implement linear operators in conjunction with custom backward functions?

soulitzer · March 10, 2022, 8:21pm

Have you looked into: GitHub - cornellius-gp/linear_operator: A LinearOperator implementation for PyTorch? Demo notebook: linear_operator/LinearOperator_demo.ipynb at demo_nb · Balandat/linear_operator · GitHub

AlexZ · May 20, 2022, 3:50pm

Hey

sorry for the rather long delay. I finally had time to look at this issue again and dig a little deeper.

I had a look at cornellius-gp/linear_operator or to be more precise, I looked at gpytorch.lazy which is a more up-to-date version of the code by the same authors.
Sadly this implementation does not solve my problems, as these lazy tensors are not an instance of torch.Tensor:

import torch
import gpytorch

isinstance(
    gpytorch.lazy.DiagLazyTensor(torch.tensor((1.,2.,3.))),
    torch.Tensor
) # False

This is a problem when one is extending torch.autograd.Function which only works properly with a torch.Tensor. Let me illustrate this with a minimal example:

Create a custom torch.autograd.Function:

class ForwardBackwardTest(torch.autograd.Function):
    @staticmethod
    def forward(ctx, a):
        print('forward')
        return a * 2

    @staticmethod
    def backward(ctx, grad_output):
        print('backward')
        return torch.rand((3,3)) # dummy gradient

For a torch.Tensor, the forward and backward passes work as expected:

t = torch.rand((3,3), requires_grad=True)

res_t = fbt.apply(t).sum()
res_t.backward()
t.grad  # Everything working as expected

gpytorch lazy tensors

For a gpytorch lazy tensor already the forward pass fails, because apparently the output of a custom forward method has to be a torch.Tensor.

d = gpytorch.lazy.DiagLazyTensor(torch.tensor((1.,2.,3.)))
d.requires_grad_(True)

res_d = fbt.apply(d).sum()

# TypeError                                 Traceback (most recent call last)
# <ipython-input-119-34bd74c90a99> in <module>
#       2 d.requires_grad_(True)
#       3 
# ----> 4 res_d = fbt.apply(d).sum()
# 
# TypeError: ForwardBackwardTestBackward.forward: expected Tensor or tuple of Tensor (got DiagLazyTensor) for return value 0

When we return a torch.Tensor from the forward method, the approach fails in the backward method.

So it seems that in order to implement things like linear operators and lazy tensors, that work with custom forward and backward methods, we have to make them instances of torch.Tensor.

Subclasing `torch.Tensor`

Let’s create a LinearOperator class that inherits from torch.Tensor:

class LinearOperator(torch.Tensor):
    @staticmethod 
    def __new__(cls, *args, **kwargs):
        shapeProvider = torch.rand((3,3))

        instance = torch.Tensor._make_subclass(cls, 
            shapeProvider, 
            True
        )
        return instance
      
    def __init__(self):
        pass

    def __mul__(self, t): 
        return torch.rand((3,3)) * t   # dummy implementation

In this implementation, the .shape property of the linear operator is set by PyTorch according to the shape of the shapeProvider tensor. This is a problem because then I have to allocate memory for the shapeProvider tensor. However, one of the main points of the linear operator approach is that I cannot store such a large tensor when working with very high-dimensional spaces. However, many linear operators between such spaces can easily be represented and stored.

Trying to reset the .shape property yields the following error:

AttributeError: attribute 'shape' of 'torch._C._TensorBase' objects is not writable

Using a wrong .shape for the linear operator is also not working, because PyTorch for example accesses the shape to check whether the gradient in the backward pass has the correct size.

Conclusion

I think that there is currently no way to implement linear operator or lazy tensors in PyTorch, that

do not need to store a tensor of the size of the operation they are representing
and work with a custom torch.autograd.Function.

I would highly appreciate it if someone finds a way to do this.