TLDR: What exact size should I give the batch_norm layer here if I want to apply it to a CNN? output? In what format?
I have a two-fold question:
1) So far I have only this link here, that shows how to use batch-norm. My first question is, is this the proper way of usage? For example
bn1 = nn.BatchNorm2d(what_size_here_exactly?, eps=1e-05, momentum=0.1, affine=True)
x1= bn1(nn.Conv2d(blah blah blah))
Is this the correct intended usage? Maybe an example of the syntax for it's usage with a CNN?
2) I know that there are sometimes caveats with usage of batch-norm during training and inference time - (for example, the original paper will compute running averages and variances of the training data AFTER the net has fully trained, and then use that in the inference equation), however I am guessing the batch-norm usage in pyTorch already does this under the hood, and so I can call forward_prop at test time the same way I would call it at train time?