Model Parameter as Function of another parameter

sesale · April 26, 2022, 9:27am

I am trying to implement an algorithm where I have randomly initialized model parameters θ and a weight matrix W. I have different tasks and my loss function is split into separate losses (one loss for each sample within each task). the weight matrix contains the corresponding weights per sample loss. I first have to perform gradient descent w.r.t. to θ. Then I have to write θ as of W and then take the gradient w.r.t. W (I think something like e.g. autograd.grad(loss,W)). My question is: In order to take the gradient w.r.t. W, do I first have to register W as model parameters? Otherwise, the gradient of the loss will of course be None w.r.t. to W, but since the parameters are weights for the losses and not directly model parameters, I am somehow confused. If so, I would assume that when taking the gradient w.r.t. to θ, I cannot do something like

SGD(model.parameters(), lr=lr)

but rather have to exclude the weights? Something like

SGD(model.parameters()[:-1], lr=lr)?

suraj.pt · April 26, 2022, 3:19pm

Can you share a minimal code snippet of your problem and expected solution? When you call backward() on the loss, it will compute gradients in the W.grad attribute. Registering in the optimizer enables it to update every time you call optimizer.step().