How to make parameters registered by register_parameter trained?

Hi,

I registered my paramert in nn.Module sub-class by function register_parameter. The example code is followed. in my training code, I found that loss kept on changing, but acc kept unchanged.

I register all parameters in top Module and use each parameter in each forward function of sub-module. I am considering whether the gradient of each parameter will be calculated according to the definition Module or the Module where the parameter is used?

in addition, if I do only forward to test acc, the acc is good as expected. without any change, if I do training, the loss kept on small changes(such as from loss = 0.6420461028972839 to loss = 0.6429245240923738), but acc kept exactly unchanged.

I tried to change lr, but almost no help.

My questions are:

  1. could those registered parameters trained/optimzed?
  2. is there any special/additional settings I should change to make those parameters trained/optimized?
self.register_parameter(
    name=param_name.replace('.', '_'),
    param=Parameter(
        torch.from_numpy(param_data), requires_grad=not is_constant
    )
)

self.register_parameter should work fine and you can verify that this parameter is trained by checking its gradients after the backward pass via model.param_name.grad. If this attribute doesn’t return None than gradients are calculated the the parameter. Assuming you’ve also passed this parameter to an optimizer via e.g. optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) then it’ll also be trained in the optimizer.step() call. You could additionally verify it by comparing its values before and after the step() call.

@ptrblck
Thank you!

I will try your ideas.

As loss.backward() will calculate gradient according to Chain Rule, I am considering where the gradient is calculated. that is what I am asking in this post:

I register all parameters in top Module and use each parameter in each forward function of sub-module. I am considering whether the gradient of each parameter will be calculated according to the definition Module or the Module where the parameter is used?

I’m not sure I understand the question completely, but during the forward pass the operations applied on the tensor will be tracked by Autograd and the gradient will be calculated using the computation graph. It doesn’t matter where the parameter is registered as its usage will be tracked.