I am trying to implement an algorithm where I have randomly initialized model parameters θ and a weight matrix W. I have different tasks and my loss function is split into separate losses (one loss for each sample within each task). the weight matrix contains the corresponding weights per sample loss. I first have to perform gradient descent w.r.t. to θ. Then I have to write θ as of W and then take the gradient w.r.t. W (I think something like e.g. autograd.grad(loss,W)). My question is: In order to take the gradient w.r.t. W, do I first have to register W as model parameters? Otherwise, the gradient of the loss will of course be None w.r.t. to W, but since the parameters are weights for the losses and not directly model parameters, I am somehow confused. If so, I would assume that when taking the gradient w.r.t. to θ, I cannot do something like
SGD(model.parameters(), lr=lr)
but rather have to exclude the weights? Something like
SGD(model.parameters()[:-1], lr=lr)?