Custom Layer, RunTime error when using values of a previous layer

Theo_Voillemin · October 7, 2022, 9:39am

Thank you for the answer !

The code is currently massive and from a large GitHub project, I’ll try to provide you with a minimal snippet as soon as I manage to recreate the error in a small environment. Probably early next week.

However, I noticed that the error does indeed occur when I use DistributedDataParallel on the network :

network = DistributedDataParallel(network, device_ids=[torch.cuda.current_device()], find_unused_parameters=find_unused_parameters)

If I only use

network = DataParallel(network)

my custom layer work well without using y.detach(), just the execution seems to be longer.

I found other issues with a similar problem of runtime error, gradient and inplace operation when using DistributedDatParallel, but with BatchNorm (solved by using SyncBatchNorm) :

Any idea why it happen with my Custom Layer ?

I just call my layers like that :

x = nn.Conv2d(num_feat, num_feat, 3, 1, 1)(input)
y = torch.reshape(x, (-1, img_size*img_size*in_c))
n = CustomLayer((img_size,img_size), in_c)(y)
x = nn.Conv2d(x+n)

Thank you again and i’ll try to provide you a compact and executable code snippet with the error as soon as possible.