I have a several layers of CNNs operating on the same size tensor. Can I use a single BatchNorm for all of them, or do I need to set a different BatchNorm per layer?
x = self.conv1(x)
x = self.batchNorm(x)
x = F.relu(x)
x = self.conv2(x)
x = self.batchNorm(x)
x = F.relu(x)
x = self.conv3(x)
x = self.batchNorm(x)
x = F.relu(x)
Or do I need to declare batchNorm1, 2, and 3 for each one of these conv layers? I don’t know how it is implemented, so I don’t know if by reusing the same BatchNorm for different layers it will screw up the statistics it uses for computing the batch norm.
Batchnorm actually has some learnable parameters. So if you re-use it, it will share these learnt parameters across the different use.
Also it is saving the running stats to be used in evaluation mode. And so if you re-use it, these stats will be shared.
It might or might not be what you want, depending on your use case.
Thanks for the response! This answers my question.
So I guess I need to have different BatchNorm() statements for each of the CNNs for two reasons: 1) there are learnable parameters \alpha and \beta that might be different from layer to layer, 2) it seems that the BatchNorm stores the batch mean and the variance somewhere so that it can be used at run-time. Where exactly is that stored though? When I save out the model using torch.save(net.state_dict(), filename) is it dumped out along with the network weights?
Also, I can’t understand why I only need to input the number of channels and not the length of the signal (in the 1d case) or W,H (in the 2D case). Any ideas?