thanks a lot but I don’t want to train the test set just to use its own mean and std, instead of the mean and std of the testing set in other words not to apply model.eval I mean the beta and gamma are fixed and were computed on the test set
is that what you mean?
I want to know how the Pytorch use the running_mean and running_std to do evaluation.
x = (x - running_mean) / running_std
std = m / (m - 1) * running_std # where m is batch size
x = (x - running_mean) / std
I think you should use batch norm in front of the relu activations.
I have the same problem when I use model.eval() in test time. I used BN layer like this:
def conv_block(self, in_channels, out_channels, kernel_size=3, stride=2, padding=1):
block = nn.Sequential(
and then used it as:
self.block1 = self.conv_block(in_channels=1,
self.block2 = self.conv_block(in_channels=1,
Is it a problem? I don’t think so. When I do not use model.eval() the results are good but when I use it, it decrease the performance drastically?
My batch size is 64 and I test the model on the same trainig data.
I have the same problem as yours, did you figure it out?
Did you find how to fix the problem?
I am also getting similar problem. I am using modified HRNet model for my personal project, and the loss gets drastically large and unstable when testing, even with the training data. (0.006 → 0.014 ~ 0.04). According to other discussions, it must be related to using Batchnorm with small Batchsize, but there seems to be no fundamental solution to this.
You could call
.train() on each batchnorm layer, e.g. something like this should work:
if isinstance(m, nn.BatchNorm2d):
model = models.resnet152()