I was looking at the code of batchnorm
def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True,
track_running_stats=True):
super(_BatchNorm, self).__init__()
self.num_features = num_features
self.eps = eps
self.momentum = momentum
self.affine = affine
self.track_running_stats = track_running_stats
if self.affine:
self.weight = Parameter(torch.Tensor(num_features))
self.bias = Parameter(torch.Tensor(num_features))
else:
self.register_parameter('weight', None)
self.register_parameter('bias', None)
if self.track_running_stats:
self.register_buffer('running_mean', torch.zeros(num_features))
self.register_buffer('running_var', torch.ones(num_features))
self.register_buffer('num_batches_tracked', torch.tensor(0, dtype=torch.long))
else:
self.register_parameter('running_mean', None)
self.register_parameter('running_var', None)
self.register_parameter('num_batches_tracked', None)
self.reset_parameters()
and I don’t really understand when to use a register_buffer/ register_parameter vs nn.parameter
By doing some test:
a = torch.nn.BatchNorm2d(100)
a.register_parameter('test',None)
a
Out[34]: BatchNorm2d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
a.test
a.test2 = torch.nn.parameter.Parameter(requires_grad=False)
a.test2
Out[37]:
Parameter containing:
tensor([])
The behavior is different, in case of a registered parameter there is no return when None is used.
Register parameter only can register a parameter or None, so why is it used?
With respect to register_buffer docs just says it is used when u want to register something which is not a parameter. So I assume i does not compute gradients. Is there any different between register_buffer and a parameter with requires_grad = false?
In the code above, why if self.track:running.stats =True they register a buffer but if False they register a parameter?
I checked it and register_buffer can also register None