DDP,batchnorm,two forward error

When using DDP, if there is a batch normalization (BN) layer in the network and only a single GPU is employed, an error will occur after two consecutive forward passes prior to backward propagation:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

However, this issue does not arise when using multiple GPUs. Why is this the case,?nn.SyncBatchNorm.convert_sync_batchnorm() has already been applied to convert batch normalization to synchronized batch normalization.

I would assume DDP would detect a single GPU and would execute the single GPU run. However, I also don’t understand your use case in applying DDP on a single device.

Your observation still does not explain your use case, so are you just trying random configs to see which one would fail?

The overall code is like this

If I run torchrun train.py, an error occurs:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3]] is at version 3; expected version 2 instead.

If I run CUDA_VISIBLE_DEVICES=2,3 torchrun --nproc_per_node 2 train.py, there is no error.