What does eval() do for BatchNorm at code aspect?

qq184861643 · December 8, 2018, 9:25am

So I’ve read the official documentation and searched a few of related questions about BN layer at evaluation time. I wonder what does function eval() exactly do for BN cause when run it, it seems no attributes of BN class changes.
here is my jupyter code:

for m in model.modules():
    if isinstance(m,nn.BatchNorm2d):
        m.eval()
        print(m)

and here is the result:

BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
……

So you can see the track_running_stats attribute is still True, and the requires_grad flags for weights are True as well. Then what does eval() do?

Further more, say I can only run a very small batch_size, like 1 because of limited GPU memory during training. What should I do? set the requires_grad to False or track_running_stats to False? Or both?

Amrit_Das · December 8, 2018, 10:25am

See there are two modes,
i) Training mode ii) Testing mode
so the eval() function informs all the layers that you are in testing phase and not in training phase, similarly the train() function informs the layers that you are in training phase.

By setting require grads to False you will be asking your network to stop calculating gradients. So it depends upon your architechture. You can manually try changing the track_running_stats flag and see the difference.

qq184861643 · December 8, 2018, 2:18pm

thx for your reply. But what is the difference in BN between training phase and eval phase since neither track_running_stats flag nor requires_grad flag is changed after running eval()?