What is the difference between `register_buffer` and `register_parameter` of `nn.Module`

I was reading the code of mask-rcnn to see how they fix their bn parameters. I notice that they use self.register_buffer to create the weight and bias, while, in the pytorch BN definition, self.register_parameter is used when affine=True. Could I simply think that buffer and parameter have everything in common except that buffer will neglect the operations to compute grad and update its values ?

By the way, what is the different between directly defining a nn.Paramter in the module and using register_parameter ?

4 Likes

Yes, you are correct in your assumption. If you have parameters in your model, which should be saved and restored in the state_dict, but not trained by the optimizer, you should register them as buffers.
Buffers won’t be returned in model.parameters(), so that the optimizer won’t have a change to update them.

Both approaches work the same regarding training etc.
There are some differences in the function calls however. Using register_parameter you have to pass the name as a string, which can make the creation of a range of parameters convenient. Besides that I think it’s just coding style which one you prefer.

23 Likes

If I have some parameters that I don’t want to be trained, can I just add them as self.some_params inside the nn.Module to preserve state? Does register_buffer do anything special in that case as compared to just storing it inside self?

@ pechyonkin

If your self.some_params are nn.Parameter objects, then you don’t have to worry about this. If they’re tensors, then they won’t be in the state_dict (unless registered as buffer).

1 Like

What are the downsides of not using a buffer? I am currently using self.some_param inside nn.Module to keep a tensor that keeps track of running average statistics of activations. I don’t need it for backprop, only to make decisions during runtime. I want to learn more about why my approach is not an optimal one. If you could explain or give some readings, that’d be great.

Do you want it in the state_dict?

I am sorry if this is a stupid question, but I am not sure if I want that. I checked this, but I still don’t see why I would need that. Would I need buffers if I want to save the model later? Are there any other reasons I would like to use state_dict rather than just assigning to self?

As @pierrecurie explained, one reason to register the tensor as a buffer is to be able to serialize the model and restore all internal states.
Another one is that all buffers and parameters will be pushed to the device, if called on the parent model:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.my_tensor = torch.randn(1)
        self.register_buffer('my_buffer', torch.randn(1))
        self.my_param = nn.Parameter(torch.randn(1))
        
    def forward(self, x):
            return x

model = MyModel()
print(model.my_tensor)
> tensor([0.9329])
print(model.state_dict())
> OrderedDict([('my_param', tensor([-0.2471])), ('my_buffer', tensor([1.2112]))])

model.cuda()
print(model.my_tensor)
> tensor([0.9329])
print(model.state_dict())
> OrderedDict([('my_param', tensor([-0.2471], device='cuda:0')), ('my_buffer', tensor([1.2112], device='cuda:0'))])

As you can see, model.my_tensor is still on the CPU, where is was created, while all parameters and buffers were pushed to the GPU after calling model.cuda().

2 Likes

Thanks for clarification! Now it makes total sense. I will actually use buffer since I am going to use GPU at some point for the model I am building.

1 Like