Custom Layer, RunTime error when using values of a previous layer

Hello !

I’m currently developing a custom layer and having trouble getting it to work, specially when I’m using the results of a previous layer in my network as input of my custom layer.

Here’s a code snippet isolating the problem I’m currently having :

    def forward(self, y):

        a = torch.unsqueeze(y, 0)
        b = a.expand(self.output_resolution[0] * self.output_resolution[1], -1, -1)
        c = torch.permute(b, (1, 0, 2))
        d = torch.sum(c, dim=-1)
        end = torch.reshape(d, (-1, self.output_resolution[0], self.output_resolution[1]))

        output = end[:, None, :, :]

        out = output.expand(-1, self.in_c, -1, -1).float()
        return out

It’s not what my custom layer will do but it isolates the last error that I can’t resolve.

y is the output of a previous Linear Layer (with shape [batch_size, num_features])
output_resolution and in_c are just some dimensions I want as output to my custom layer

The problem is, from the moment I use y, I get the RunTime error : one of the variables needed for gradient computation has been modified by an inplace operation

I try everything, copy pasting every value of the y tensor manually, use y.clone(), torch.clone(y), even torch.clone(y.clone()), nothing works, I get the RunTime error even though I only do unsqueeze and expand operation on y.

On my full Custom Layer, the only way I found to make it run, is by using y.detach(), but I’m afraid that, a part of my custom layer that is directly using value of y will not be considered in the backward and gradient computation operation.

I have two questions :

  • Is there a problem using y.detach() in my final Custom Layer, specially in the gradient computation ?
  • Is there a solution to my problem, is this a problem with some upper code (I don’t know, maybe by using DistributedDataParallel or other) ?

Thank you in advance and have a nice day !

TV

Could you post a minimal, executable code snippet which would reproduce the issue, please?
I don’t see any obvious inplace operation in your code so don’t know what’s causing the issue.

Thank you for the answer !

The code is currently massive and from a large GitHub project, I’ll try to provide you with a minimal snippet as soon as I manage to recreate the error in a small environment. Probably early next week.

However, I noticed that the error does indeed occur when I use DistributedDataParallel on the network :

network = DistributedDataParallel(network, device_ids=[torch.cuda.current_device()], find_unused_parameters=find_unused_parameters)

If I only use

network = DataParallel(network)

my custom layer work well without using y.detach(), just the execution seems to be longer.

I found other issues with a similar problem of runtime error, gradient and inplace operation when using DistributedDatParallel, but with BatchNorm (solved by using SyncBatchNorm) :

Any idea why it happen with my Custom Layer ?

I just call my layers like that :

x = nn.Conv2d(num_feat, num_feat, 3, 1, 1)(input)
y = torch.reshape(x, (-1, img_size*img_size*in_c))
n = CustomLayer((img_size,img_size), in_c)(y)
x = nn.Conv2d(x+n)

Thank you again and i’ll try to provide you a compact and executable code snippet with the error as soon as possible.

I don’t know what your CustomLayer exactly does, but in case it contains registered parameters and buffers, note that you are re-initializing it in your forward method (same as the first nn.Conv2d layer while the second should raise an error).

Take a look at this tutorial to see how nn.Modules are used. The standard approach is to initialize and register all modules in the __init__ method and just use them in the forward.