I notice nn.Parameters could require NO gradients.
Then i feel confused what’s the application difference between nn.Parameters(Tensor(3,4), required_grad=False)
and self.register_buffer(name, tensor(3,4))
?
I guess that a buffer is not supposed to be fixed (you may change its value during iterations of an algorithm), while a parameter is fixed, but may only be modified through a gradient descent.
For example in an exponential moving average:
Z = mu * Z + (1-mu) * X(n)
Z
should be a buffer and mu a parameter, while none of them would require gradients. Then, as far as I know, there is no functional difference, it’s rather a convention.
Hi, nn.Parameters Could also requires no gradient, Im curious the difference of nn.Parameters when it requires no gradient.