Resnet: batchnorm2D with in_place=False

James_W · August 2, 2021, 4:54pm

I am looking to modify a pretrained resnet with a non inplace version of batchnorm2D since this is causing me problems when I run in distributed mode (runtimerror saying that grad computation has not been possible due to inplace operation)

Error detected in CudnnBatchNormBackward

I instantiate the model the vanilla way, no magic:

model=torchvision.modlels.resnet152(pretrained=True)

Is there a known solution to this problem?

ptrblck · August 8, 2021, 7:07am

Batchnorm layers don’t have an inplace argument, so could you post an executable code snippet which reproduces this issue for further debugging as well as the output of python -m torch.utils.collect_env?

James_W · August 8, 2021, 12:07pm

thank you @ptrblck I have managed to solve this by setting broadcast_buffers=False It turns out that having it as True is an inplace operation.

but now, I have a massive memory leak issue which I have described here:

:((