BatchNorm acts weird when part of the parameters are freezed during training

S.F · October 20, 2017, 9:40pm

Hi all,

I’m trying to train a network which have two parts. The first part of the network are borrowed from other already trained model. Then I tried to freeze their weights and only did training on the second network by using

for param in net1.parameters():  #params only in the first layer
    param.requires_grad = False

In the second net, I used several BatchNorm2D layers and do net.train().
When I tried to do testing, I set net.eval(). The testing result generated high loss compared to training. (When I didn’t put net.eval(), it worked good, though I know this is not correct.)

class net(nn.Module):
    def __init__(self):
        super(net, self).__init__()
        self.net1 = net1()
        self.net2 = net2()

    def forward(self,x):
        out_net1 = self.net1(x)
        out_img = self.net2(out_net1)
        return out_img

I tried to figured out where’s the problem, then I just deleted the above code and set the parameters of the first network free, and the testing result worked good when I set net.eval().

Does anyone know where is the problem? It seems to me the BatchNorm and freezing weights are conflicting but it doesn’t make any sense since they are not in the same subnetwork.

UPDATE: the structure of the network is like an autoencoder, there are several convolutional layers in the encoder and transposed convolutional layers in the decoder. After I delete the bias = False in the Conv2d in the convolutional layers, it worked. However, I still don’t know why is that.