Model.eval() gives incorrect loss for model with batchnorm layers

zhangboknight · September 19, 2017, 4:25am

I tried to train a model with batchnorm layers. During the training, I set model.train(). Every 100 iteration, I validate the accuracy and set model.eval(). However, the validation is not correct. I don’t think this is due to overfitting because even if I use the same image as training, the testing loss is also quite different from the training loss. Also, if I still set model.train() during testing, the testing loss is correct. But such usage does not make sense because my model contains batchnorm layers.

Below is my training code:

for epoch in range(num_epochs):
    epoch_loss = 0.0
    optimizer = lr_scheduler(optimizer, epoch)

    for iteration, data in enumerate(dataloader, 0):
        iter_index += 1
        label_patch = data['label_patch']
        residue_patch = data['residue_patch']
        stacked_patch = data['stacked_patch']
        microshift_patch = data['train_patch']
        inputs, residues, labels, microshifts = Variable(stacked_patch.type(dtype)), Variable(residue_patch.type(dtype), requires_grad=False), Variable(label_patch.type(dtype), requires_grad=False), Variable(microshift_patch.type(dtype), requires_grad=False)

        # set model to train mode (before zero grad)
        model.train()

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward
        outputs = model(inputs)
        loss = criterion(outputs, residues)

        # backward + optimize only if in training phase
        loss.backward()
        optimizer.step()

        # statistics
        epoch_loss += loss.data[0]

        # test the model every 100 iterations
        if iter_index % logging_iter == 0:
            loss_test, psnr_test = test_model(model)  

    # checkpoint for each epoch
    model_out_path = "checkpoints/model_epoch_{}_residue.pth".format(epoch)
    torch.save(model, model_out_path)

zhangboknight · September 19, 2017, 4:27am

The testing code which is called every 100 training iters is as following:

def test_model(model):
    psnr_test_avg = 0
    loss_test_avg = 0
    model.eval()
    
    for iteration, test_data in enumerate(dataloader_test, 0):
        label_test = test_data['label_patch']
        residue_test = test_data['residue_patch']
        stacked_test = test_data['stacked_patch']
        microshift_test = test_data['train_patch']
        inputs_test, residues_test, labels_test, microshifts_test = Variable(stacked_test.type(dtype), requires_grad=False), Variable(residue_test.type(dtype), requires_grad=False), Variable(label_test.type(dtype), requires_grad=False), Variable(microshift_test.type(dtype), requires_grad=False)
        outputs_test = model(inputs_test)
        loss_mse_test = criterion_mse(outputs_test + microshifts_test, labels_test).data.cpu().numpy()
        loss_l1_test = criterion(outputs_test, residues_test).data.cpu().numpy()
        psnr_test = 10 * np.log10(255 * 255 / loss_mse_test)
        loss_test_avg += loss_l1_test
        psnr_test_avg += psnr_test

    loss_test_avg /= (iteration + 1)
    psnr_test_avg /= (iteration + 1)

    return loss_test_avg, psnr_test_avg

smth · September 20, 2017, 5:00am

it is possible that your training in general is unstable, so BatchNorm’s running_mean and running_var dont represent true batch statistics.

http://pytorch.org/docs/master/nn.html?highlight=batchnorm#torch.nn.BatchNorm1d

Try the following:

change the momentum term in BatchNorm constructor to higher.
before you set model.eval(), run a few inputs through model (just forward pass, you dont need to backward). This will help stabilize the running_mean / running_std values.

Hope this helps.

zhangboknight · September 20, 2017, 7:24am

Thanks for your reply. I tried them but still get the error. I found that using the same code, sometimes the model.eval() can be correct but sometimes incorrect. I will try further and update if I found a solution.

david-leon · December 29, 2017, 7:30am

Same problem with latest 0.3.0 release here. Have you find any solution? @zhangboknight
Change momentum won’t solve the problem @smth

alexbellgrande · January 14, 2018, 7:09pm

I’m having the same issue. Really a bummer to have to use train mode for validation/testing.

Higang · January 22, 2018, 10:27pm

I also have the same problem and haven’t figured out the reason.

falmasri · January 27, 2018, 1:59am

Any update regarding this problem. I already posted the same question. it seems to me that many are facing the same problem. Could pytorch community react to this problem ?

meetshah1995 · January 27, 2018, 8:32pm

I have the same problem.

I’m trying to load caffe weights in a pytorch model with batchnorm layers, each time I load the weights from the caffemodel file, the result for the same input is different even in eval mode.

I’m actually updating the running_mean and running_var from the caffemodel weights, so there shouldn’t be any issue with bad running_means during inference.

smth · January 27, 2018, 8:37pm

@meetshah1995 the meaning of Caffe’s running_mean might be different from pytorch’s running_mean.

smth · January 27, 2018, 8:39pm

@falmasri I wrote above in the comment here: Model.eval() gives incorrect loss for model with batchnorm layers with a working answer.

It’s not a problem in the sense that it’s not a software bug.

It’s a problem in the sense that if you have a non-stationary training, you will see this behavior unless you adjust your momentum term of the BatchNorm. We set the momentum to 0.1 because for most workloads that we use it was sufficient. Play around with it.

meetshah1995 · January 27, 2018, 8:47pm

@smth Agreed that running_mean may mean different things in Caffe and PyTorch. However in eval mode, I guess only these 5 things - (running_mean, running_var, weight, bias, eps) should affect the final output.

I’ll try to see if I can come up with a minimal reproducible example for this.

falmasri · January 27, 2018, 8:53pm

@smth What do you mean by non stationary training ?

smth · January 27, 2018, 10:20pm

@falmasri means the statistics of activations change rapidly during training, such that the running_mean and running_std statistics for BatchNorm at the momentum of 0.1 are not valid anymore.

rohun · June 7, 2018, 7:17am

Is it theoretically incorrect though?

rohun · June 7, 2018, 9:15am

Nice, setting the momentum to 0.5 seems to make loss calculated using model.eval() similar to the loss computed using model.train(), but only after few epochs of differing results.

Does the second suggestion work only when we are loading the model for eval? It should not affect the loss calculation, correct or incorrect, if we are training the network and running validation every nth step.

Xingyi_Zhou · June 15, 2018, 5:17am

I also met this problem in my project (See my answer at https://github.com/xingyizhou/pytorch-pose-hg-3d/issues/16 and https://github.com/bearpaw/pytorch-pose/issues/33). In short, down-grading pytorch version to 0.1.12 will resolve the problem. But I really don’t know what happens to the BN implementation from 0.1.12 to the later versions …

SimonW · June 15, 2018, 6:22am

I replied on the issue, but running stats is unstable in nature with batch size only being 1.

Xingyi_Zhou · June 15, 2018, 6:39am

Thanks for the reply! The training batch size is 6 instead of 1. Actually I have also tried later batch size (32) with other architectures (upsampling on ResNet18) but the bug remains. My major question is I don’t understand why pytorch 0.1.12 works while >= 0.2 does not.

youkaichao · July 26, 2018, 5:12pm

I think this is not about the momentum. I have the same problem. when I call

model.eval()
model(input)
model.train()
model(input)
model.eval()
model(input)
model.train()
model(input)

every call of model(input) is almost the same if it is after model.train() , but differs with what follows model.eval().