nn.Parameters(Tensor(3,3), required_grad=False) VS self.register_buffer(tensor(3,4))

I notice nn.Parameters could require NO gradients.
Then i feel confused what’s the application difference between nn.Parameters(Tensor(3,4), required_grad=False) and self.register_buffer(name, tensor(3,4))?

I guess that a buffer is not supposed to be fixed (you may change its value during iterations of an algorithm), while a parameter is fixed, but may only be modified through a gradient descent.

For example in an exponential moving average:

Z = mu * Z + (1-mu) * X(n)

Z should be a buffer and mu a parameter, while none of them would require gradients. Then, as far as I know, there is no functional difference, it’s rather a convention.

Hi, nn.Parameters Could also requires no gradient, Im curious the difference of nn.Parameters when it requires no gradient.