Inplace operations for custom layers complicating backpropagation

I’m trying to implement a custom version of ReLU that requires a bit more logic. It looks something like this

class ReLU(torch.nn.Module):
    def __init__(self, in_features):
        super(ReLU, self).__init__()
        # Parameterization of the special ReLU layer
        self.lambdas = Parameter(torch.rand(in_features))

    def forward(self, x):
        # compute some upper and lower bounds on the input rows
        bounds = map(row_bound, x)
        _, epsilon_id = x.shape

        # loop over rows
        for i, (l, u), lmb in zip(range(x.shape[0]), bounds, self.lambdas):
            # check if the upper bound is <= 0 then the ReLU returns 0
            if u <= 0:
                # Inplace op
                x[i] = x[i] * 0
            # If the bounds cross 0 the ReLU implements some custom logic
            elif l < 0:
                x = torch.nn.ZeroPad2d((0, 1))(x)
                # Inplace op
                x[i] = x[i] * lmb

                if lmb >= u / (u - 1):
                    # Inplace op
                    x[i, epsilon_id] = -l * lmb / 2
                else:
                    # Inplace op
                    x[i, epsilon_id] = u * (1 - lmb)
                # Inplace op
                x[i, 0] = x[i, 0] + x[i, epsilon_id]
                epsilon_id += 1
        
        # If neither if branch ran the lower bound >= 0 and the ReLU returns identity 
        return x

The input x comes in as an n by m tensor. The forward pass first computes upper and lower bounds for each row, then based on those bounds it either zeroes that row, leaves it, or applies a bit more logic based on the layer parameters lambda.

The issue is I don’t know how to modify individual rows of x without performing in-place operations, and if I do those operations I get frequent errors doing backpropagation to learn the layer parameters.

x[i] = x[i] * lmb

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [6]] is at version 3; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

My one attempt to fix this was to make whatever gets passed into the model, i.e x, require gradients, but this seems unnecessary since I don’t ever want to compute gradients of x and it still doesnt help because I get the following error likely because I’m directly manipulating trainable parameters

RuntimeError: leaf variable has been moved into the graph interior

My only other thought is that the logic is too complex for autograd and in need to implement this ReLU as a function with a proper backward pass method, but I don’t want to go there if I don’t have to.

If anyone has some insight on how to solve this problem that would be much appreciated. Thanks

Hi,

One solution I would recommend to make this into out of place ops is to use list and torch.stack (or torch.cat if the dimension already exists):

res = []
for i in range(foo):
    new_x_i = bar(x[i])
    res.append(new_x_i)
return torch.stack(res, 0)

If you want to stick with inplace operations, and want to avoid the “RuntimeError: leaf variable has been moved into the graph interior” error, simply clone the input you get: x = x.clone() at the beginning of your function.

Thanks for the response. Ill give your first attempt a try but it seems like a bit of a pain. Cloning the variables doesn get rid of the leaf error, but unfortunately not the inplace operation error.

For the inplace above, the problem is actuall with x[i] = x[i] * lmb. Since you need gradients for lmb as they are your parameters, you need the value x[i] to compute those. But you change x[i] inplace on the same line. So the value of x[i] needed to compute the gradients of lmb is overwritten hence the error you see. To remove the inplace error, you will need to at least make sure you don’t override this value. And potentially the same thing at other places.

So do you suggest I use the method you gave above for all such operations? I don’t see why the operation being in place makes a huge difference. As a n example

x = torch.rand(1, in, requires_grad=False)
x = nn.Linear(in, out)(x)
loss = x[0,0]
loss.backward()

Here we would get no errors even though x is needed to compute gradients of the layer parameters. The only difference is presumably that nn.Linear doesn’t do inplace modifications, but that seems like an almost irrelevant point. Wouldnt nn.ReLU run into the whole exact problem above since its needs to update each element of a tensor-based on its current value.

x = nn.Linear(in, out)(x) is not an inplace operation !
What this does is

  • create a linear layer
  • forward x through it and get a new result
  • assign to this new result the variable name “x”

You would do inplace operation if you use any of pytorch’s method that finish with a trailing underscore (like mul_(), add_(), scatter_() etc) or if you use python’s inplace ops (like +=, /= etc).

For example if you do:

x = torch.rand(1, in, requires_grad=False)
x_new = nn.Linear(in, out)(x) # This is out of place
x += 1 # This is inplace
loss = x_new[0,0]
loss.backward()

So now we modify the input of the Linear inplace so this won’t run.

The ReLU’s backward can be computed both with the input or the output of the function. So we can make inplace relu.