BatchNorm1d is not that obvious when I checked the source code.
Let’s check this:
if self.affine:
self.weight = Parameter(torch.Tensor(num_features))
self.bias = Parameter(torch.Tensor(num_features))
else:
self.register_parameter('weight', None)
self.register_parameter('bias', None)
First Parameter
is the instruction that weight
will be the learnable parameter. This means we will learn the weight.
Then register_parameter
adds a parameter to the module. But can you tell me what is the difference?
I just assume this has to do something with making parameters that are shared for every mini batch and that are specific for every mini batch.
There is also similar question without a clear answer.
The general question is: How many parameters in total we can learn in BatchNorm1d.
I think 4, but I am not sure.
I know if we set affine=True
we will learn weight
and bias
parameters.
Apart from these two, I somehow think we can also learn mean
and std
. However, I am not sure.
Maybe the current implementation of _BatchNorm
(which is the base class for BatchNorm1d
) is not learning mean and std(var).
Here is the last bit of code I will share with you:
self.register_buffer('running_mean', torch.zeros(num_features))
self.register_buffer('running_var', torch.ones(num_features))
self.register_buffer('num_batches_tracked', torch.tensor(0, dtype=torch.long))
else:
self.register_parameter('running_mean', None)
self.register_parameter('running_var', None)
self.register_parameter('num_batches_tracked', None)
From this code I can understand that we can track running_mean
and running_var
(which should be variance I guess) and number of mini batches (num_batches_tracked
) we processed so far. The first two are tensors, and the second is a number of mini batches tracked. So at least these two are not the constants. Right?