Hi,
I did read that PyTorch is not supporting the so called sync BatchNorm. This is needed to
train on multi GPU machines. By question is: Are there any plans to implement sync BatchNorm
for PyTorch and when will it be released?
An other question: What is the best workaround when you want to train with images and need
large batch sizes?
DistributedDataParallel can be used in two different setups as given in the docs.
Single-Process Multi-GPU and
Multi-Process Single-GPU, which is the fastest and recommended way.
SyncBatchNorm will only work in the second approach.
I’m not sure, if you would need SyncBatchNorm, since FrozenBatchNorm seems to fix all buffers:
BatchNorm2d where the batch statistics and the affine parameters are fixed.
It contains non-trainable buffers called
“weight” and “bias”, “running_mean”, “running_var”,
initialized to perform identity transformation.