Hello, I use DDP for training on multiple gpus. My model has several BatchNorm layers and in train loop I need to make forward pass two times on different batches. But if I make two forward passes, torch can not make backward pass due to inplace change of batchnorm stats. One of possible solutions is to disable synchronization of buffers, but it does not suit me as I want to update batchnorm stats with bigger batches. Is it possible to make two independent forward passes with batchnorm stats synchronization?
This sounds strange, as batchnorm stats are buffers and will thus not get a gradient and won’t be updated by the optimizer. Instead, in each forward pass the running stats will be updated using a running average.
Are you sure the batchnorm stats cause the error? If so, did you somehow manipulate them to get gradients and be trainable?
I understand that batchnorm stats are buffers. Actually I ran into the problem like in this post. In answers you suggested disabling buffers synchronization. But it means that each model on each gpu will receives stats from small batch (is it?), and I want to update batchnorm with stats using batches from all gpus.
Thanks for linking to this older post. Could you verify that setting this argument indeed solves the issue?
Yes, setting this argument allows to call forward passes multiple times, but I’m not sure that batchnorm stats are collected from all gpus