Given some arbitrary model, I would like to manipulate each parameter p before it is involved in a computation. I’m interested in multiplying each p by some learnable constant c. I can’t manually adjust each parameter after the forward pass is completed because then the learnable constant would not be a part of the computation graph. Ideally, the operation p*c would happen before p is used in the forward pass. Then, during the backwards pass, both p and c would receive a gradient wrt the loss.
I looked into hooks to achieve this, but all the posts I have seen so far just use hooks to print out the underlying tensor or gradient information. Is there a way to use hooks in the forward pass to adjust parameters or is this not possible? I could manually define each parameter in the network, and then have control over it; however, my current model has a lot of convolutional layers, so I would prefer not to do this manual work.
I tried to define hooks to achieve it but encountered some problem.
c = nn.Parameter(torch.randn(1), requires_grad=True)
opt = torch.optim.SGD(model.parameters(), lr=0.001)
opt.add_param_group({'params': c})
def hooks(module, input):
module.weight = nn.Parameter(module.weight * c)
for module in model.children():
if isinstance(module, nn.Conv2d):
module.register_forward_pre_hook(hooks)
c is defined as nn.Parameter and if we want to modify module.weight we must pass nn.Parameter or None to it, then registered it to each child_module. But, c is still not a learnable param, and what I found is:
module.weight is leaf variable, it is okay since it used to update.
c is also leaf variable, it means that gradient flow will not pass back to c? and c is not included in the computation graph?