Different results without optimizer.step()

I used batch size of 8 and adam optimizer. In testing phase, i pass the images and got PSNR and SSIM. Now before testing i have pass some batches but make the loss manually 0 and do backpropagation without optimizer.step(). The results are different even loss is 0 and without optimizer.step().

How the weights are changing?

Hi, would you mind sharing how your model architecture is built?

There are some layers that behave different in each pass in order to reduce overfitting. One example is the Dropout layer. With this little example you can see how the same layer behaves differently each time during training mode.

drop = torch.nn.Dropout()
ones = torch.ones(3, 3)

drop.train()
print(f"Train: 1\n{drop(ones)}")
print(f"Train: 2\n{drop(ones)}")
print(f"Train: 3\n{drop(ones)}")
drop.eval()
print(f"Eval: \n{drop(ones)}")
#Output:
Train: 1
tensor([[2., 0., 2.],
        [2., 0., 2.],
        [0., 0., 2.]])
Train: 2
tensor([[2., 2., 2.],
        [0., 2., 0.],
        [0., 2., 2.]])
Train: 3
tensor([[0., 2., 0.],
        [0., 0., 2.],
        [0., 2., 2.]])
Eval: 
tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

Hope this helps :smile:

No drop out in architecture. The difference between 2 run is thant in 2nd run i just do backward() without optimizer.step() before testing so weight shoud not change. Even i manually changed the loss to 0 so there should not be any gradient flow during backpropagation. Even though PSNR SSIM is different. Used batchnorm as “nn.BatchNorm2d, affine=True, track_running_stats=True”

Could you post a minimal executable code that preproduces this behavior?

That way it will be easier to look into it.

After debugging, i come to know the things change in forward pass even if i comment backward. So in 2nd run i just do forward pass with few batches before testing without backpropagation and optimizer.step(). Any parameter (when batch norm is used or in some other case) change during forward pass?

If you “track running stats” with batchnorm it basically will update the statistics of the batchnorm module depending on the batch and the momentum.

use model.eval() for these comparisons.

1 Like

Thanks. It really helps.