What does model.eval() do for batchnorm layer?

Hi Everyone,
When doing predictions using a model trained with batchnorm, we should set the model to evaluation model. I have a question that how does the evaluation model affect barchnorm operation? What does evaluation model really do for batchnorm operations? Does the model ignore batchnorm?

3 Likes
During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.

During evaluation, this running mean/variance is used for normalization.

Reference: http://pytorch.org/docs/master/nn.html#torch.nn.BatchNorm1d

10 Likes

I got the same problem when I trained a model using BN layer. If I just need to test one image,the BN layer will affect the result because of the change of batch size?

When evaluating you should use eval() mode and then batch size doesnt matter.

5 Likes

Thanks~ I have solved the problem~

Hey Soumith,
Maybe a trivial question:

  • Trained a model with BN on CIFAR10, training accuracy is perfect
  • Testing with model.train(True) will get 76% accuracy
  • Tesing with model.eval() will get only 10% with a 0% in pretty much every category.

Why is this? It should be the opposite, right? @smth

2 Likes

How did you construct the BN layers?

Standard way:

nn.BatchNorm2d(64)

Where 64 is the num of output filters of the previous layer.

That’s weird. Do you mind sharing your script?

Hey Simon, sorry for being late.

The definition of the model is as follows, ignore the fact that hyper-params shoudn’t be defined in that way (it’s an old code). Any idea?

class Keras_Cifar2(nn.Module):
    def __init__(self, rank1, rank2):
        super(Keras_Cifar2, self).__init__()

        # hyperparams
        self.kern = 3  # for all layers
        self.filt_size1 = 32
        self.filt_size2 = 64
        self.filt_fc1 = 512
        self.num_classes = 10

        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.conv2 = nn.Conv2d(32, 32, 3)
        self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
        self.conv4 = nn.Conv2d(64, 64, 3)

        self.pool = nn.MaxPool2d(2, 2)

        self.bn_1 = nn.BatchNorm2d(1)
        self.bn_2 = nn.BatchNorm2d(rank1)
        self.bn_3 = nn.BatchNorm2d(rank2)
        self.bn_4 = nn.BatchNorm2d(self.filt_fc1)
        self.bn_5 = nn.BatchNorm2d(self.num_classes)
        self.bn_6 = nn.BatchNorm2d(32)
        self.bn_7 = nn.BatchNorm2d(64)

        # decomposition
        self.cpdfc1 = nn.Conv2d(64, rank1, 1)
        self.cpdfc2 = nn.Conv2d(rank1, rank1, (6, 1))
        self.cpdfc3 = nn.Conv2d(rank1, rank1, (1, 6))
        self.cpdfc4 = nn.Conv2d(rank1, self.filt_fc1, 1)

        # conv2fc
        #self.conv2fc1 = nn.Conv2d(64, self.filt_fc1, 5)
        self.conv2fc2 = nn.Conv2d(self.filt_fc1, self.num_classes, 1)


    def forward(self, x):

        x = F.relu(self.conv1(x))
        x = self.bn_6(x)
        x = self.pool(F.relu(self.conv2(x)))
        x = self.bn_6(x)

        x = F.relu(self.conv3(x))
        x = self.bn_7(x)
        x = self.pool(F.relu(self.conv4(x)))
        x = self.bn_7(x)

        x = self.cpdfc1(x)
        x = self.bn_2(x)
        x = self.cpdfc2(x)
        x = self.bn_2(x)
        x = self.cpdfc3(x)
        x = self.bn_2(x)
        x = F.relu(self.cpdfc4(x))
        x = self.bn_4(x)
        x = self.conv2fc2(x)
        x = self.bn_5(x)

        x = x.view(-1, self.num_classes) 
        return x

You shouldn’t re-use BN layers. For example, here

self.bn_6 sees data from two different layers, but accumulating the values to the same running stats buffer. Then the running stats will be inaccurate and the performance will suffer in eval() mode. Make sure that each BN layer is used only at one place in the network.

1 Like

Oh, right. Probably when I wrote it I thought to define just a layer type and then use it as many times as it needed. But lol, that’s a single member.

Then a second question is: what’s the best practice in making heavy use of BNs? Write just as many as one needs, or define all of them in a dictctionary of BNs?

In your case, the network is pretty sequential. So I’d suggest use construct a list of layers in sequential order (F.relu can also be written with module nn.ReLU) and use the nn.Sequential wrapper. :slight_smile:

Yup. I didn’t use the Sequential container since it was taken like that from the tutorial, but it’ll get definitely cleaner with that.

1 Like

@smth I want to know the parameter running_var in the batch normalization refers to the variance or the standard deviation?

Thanks, I had the same issue.

Hi
Can someone help me to understand why applying model.eval is better in the testing phase?
Thanks in advance :slight_smile:

Remove your last or even 2nd last BN. The BN normalizes feature, the last output is class scores and should not be normalized.

train mode BN uses stat from the batch, test phase it is essentially “cheating” because it accesses to other examples in the batch (hence cannot perform if batch size = 1)

I mean, why is it better to use model.eval and take the running statistics and not rely on the current test image statistics?

  1. because the params are trained on train stats, if test stats are different, then the result might be different
  2. if you compute test stat, then you are basically “train” on test set, because stat in this case is a trained param. You can do it, nobody says you can’t, just that ppl would consider it “cheating”