Data parrallel, Batch norm & .eval() outputs 0

azenor · July 30, 2018, 8:38am

Hello

When I put some nn.BatchNorm2d() in my network and use nn.DataParallel(net, device-ids=[0,1]), my network always outputs 0 when put in eval() mode, but works great in train() mode.
It works great when not using DataParallel in eval() mode also.

From what I understand nn.BatchNorm2d() can’t handle stats on 2 GPUs is that the explanation ?

Thank you !

SimonW · July 30, 2018, 1:20pm

It can handle multiGPU, but BN generally doesn’t quite work when you have low batch size.

azenor · July 30, 2018, 1:44pm

Even with 200 images per batch it does this.

I had to put track_running_stats= false for every Batch layers when using DataParallel for it to work, but doesn’t it defeat the purpose of differentiating training and evaluation ?

SimonW · July 30, 2018, 3:21pm

what’s your BN setting?