# Custom gradient for an input that is not a tensor

Can I have a custom gradient for an input that is not a tensor?
In other words, I want to get rid of the following error when I pass a function to a self-written `torch.autograd.Function`

``````function ... returned a gradient different than None at position 1, but the corresponding forward input was not a Variable
``````

## My Use-case

Let me explain my use-case: I have a variety of linear operators that look more or less like this:

``````class LinearOperator(torch.nn.Module):
def __init__(self, tensor):
super().__init__()

self.tensor = tensor

def forward(self, x):
"""
For a linear operator A and a tensor x this implements x -> A(x)
"""
return something( self.tensor, x )

def rmatvec(self, x):
"""
This implements the transpose of A applied to a tensor: x -> A^T(x)
"""
raise somethingTranspose( self.tensor, x )
``````

The functions `something` and `somethingTranspose` are implemented using PyTorch so it should be possible to backprop through them.

While I can store the variable `tensor` with which I can compute the linear mapping, I cannot - due to storage space restrictions - store the entire transformation matrix that represents the linear operator `A`.

I also have a forward function that takes a linear operator A and a tensor x and produces some result f(A, x).
While I can backpropagate through the function f, I want to implement the gradient differently (i.e. more efficiently) by hand in the backward function.

The code then looks like this:

``````class MyFunctionF(torch.autograd.Function):

@staticmethod
def forward(ctx, A, x):
ctx.A = A   # Linear Operator
ctx.x = x   # PyTorch tensor

result = computeF(A, x)
# result is a PyTorch tensor

return result

@staticmethod

``````

I am using these classes in the following way:

``````# Both tensor and x are PyTorch Tensors
tensor = ...
x = ...

linOp = LinearOperator(tensor)
res = MyFunctionF.apply(linOp, x)

linOp1 = OtherLinearOperator(tensor)
res1 = MyFunctionF.apply(linOp1, x)

linOp2 = NextLinearOperator(tensor)
res2 = MyFunctionF.apply(linOp2, x)
``````

Here I have a multiple different classes of linear operators (`LinearOperator`, `OtherLinearOperator`, `NextLinearOperator`, …).

The forward pass works fine, but during the backward pass I am getting the error
`function MyFunctionFBackward returned a gradient different than None at position 1, but the corresponding forward input was not a Variable`

I guess the problem is that `grad_A` in the `backward` function is a tensor, but the input `A` in `forward` is not a tensor. Is this correct?

## Workaround

A workaround is to construct the linear operator in the forward function of `MyFunctionF`, but this means that

1. I have to implement the function f for every type of linear operator.
2. I have to backpropagate through the forward method of the linear operator as well by hand, although the gradient of the Linear Operator forward method can be computed with backprop.
``````class MyFunctionF(torch.autograd.Function):

@staticmethod
def forward(ctx, tensor, x):
A = LinearOperator(tensor)
ctx.A = A
ctx.x = x

result = computeF(A, x)

return result

@staticmethod

``````

Therefore my question:
Is there a different way to implement linear operators in conjunction with custom backward functions?

Hey sorry for the rather long delay. I finally had time to look at this issue again and dig a little deeper.

I had a look at `cornellius-gp/linear_operator` or to be more precise, I looked at gpytorch.lazy which is a more up-to-date version of the code by the same authors.
Sadly this implementation does not solve my problems, as these lazy tensors are not an instance of `torch.Tensor`:

``````import torch
import gpytorch

isinstance(
gpytorch.lazy.DiagLazyTensor(torch.tensor((1.,2.,3.))),
torch.Tensor
) # False
``````

This is a problem when one is extending `torch.autograd.Function` which only works properly with a `torch.Tensor`. Let me illustrate this with a minimal example:

Create a custom `torch.autograd.Function`:

``````class ForwardBackwardTest(torch.autograd.Function):
@staticmethod
def forward(ctx, a):
print('forward')
return a * 2

@staticmethod
print('backward')
``````

For a `torch.Tensor`, the forward and backward passes work as expected:

``````t = torch.rand((3,3), requires_grad=True)

res_t = fbt.apply(t).sum()
res_t.backward()
t.grad  # Everything working as expected
``````

## gpytorch lazy tensors

For a gpytorch lazy tensor already the forward pass fails, because apparently the output of a custom forward method has to be a `torch.Tensor`.

``````d = gpytorch.lazy.DiagLazyTensor(torch.tensor((1.,2.,3.)))

res_d = fbt.apply(d).sum()

# TypeError                                 Traceback (most recent call last)
# <ipython-input-119-34bd74c90a99> in <module>
#       3
# ----> 4 res_d = fbt.apply(d).sum()
#
# TypeError: ForwardBackwardTestBackward.forward: expected Tensor or tuple of Tensor (got DiagLazyTensor) for return value 0
``````

When we return a `torch.Tensor` from the `forward` method, the approach fails in the `backward` method.

So it seems that in order to implement things like linear operators and lazy tensors, that work with custom `forward` and `backward` methods, we have to make them instances of `torch.Tensor`.

## Subclasing `torch.Tensor`

Let’s create a `LinearOperator` class that inherits from `torch.Tensor`:

``````class LinearOperator(torch.Tensor):
@staticmethod
def __new__(cls, *args, **kwargs):
shapeProvider = torch.rand((3,3))

instance = torch.Tensor._make_subclass(cls,
shapeProvider,
True
)
return instance

def __init__(self):
pass

def __mul__(self, t):
return torch.rand((3,3)) * t   # dummy implementation
``````

In this implementation, the `.shape` property of the linear operator is set by PyTorch according to the `shape` of the `shapeProvider` tensor. This is a problem because then I have to allocate memory for the `shapeProvider` tensor. However, one of the main points of the linear operator approach is that I cannot store such a large tensor when working with very high-dimensional spaces. However, many linear operators between such spaces can easily be represented and stored.

Trying to reset the `.shape` property yields the following error:

``````AttributeError: attribute 'shape' of 'torch._C._TensorBase' objects is not writable
``````

Using a wrong `.shape` for the linear operator is also not working, because PyTorch for example accesses the `shape` to check whether the gradient in the backward pass has the correct size.

## Conclusion

I think that there is currently no way to implement linear operator or lazy tensors in PyTorch, that

• do not need to store a tensor of the size of the operation they are representing
• and work with a custom `torch.autograd.Function`.

I would highly appreciate it if someone finds a way to do this.