When I put some nn.BatchNorm2d() in my network and use nn.DataParallel(net, device-ids=[0,1]), my network always outputs 0 when put in eval() mode, but works great in train() mode.
It works great when not using DataParallel in eval() mode also.
From what I understand nn.BatchNorm2d() can’t handle stats on 2 GPUs is that the explanation ?
I had to put track_running_stats= false for every Batch layers when using DataParallel for it to work, but doesn’t it defeat the purpose of differentiating training and evaluation ?