Pytorch gradient calculation

Hi, I am new to pytorch and this is what I want to do:

I have two models that are related to one another: modelA and modelB. I want to get three separate gradients. I am able to get the first two without any issues, but I am not sure how I can get the third one.

  1. differentiating the loss of modelA wrt modelA.parameters
    [v.grad.data for v in modelA.parameters()]

  2. differentiating the loss of modelB wrt modelB.parameters
    [v.grad.data for v in modelB.parameters()]

  3. differentiating the loss of modelB wrt modelA.parameters

This is what I tried: torch.autograd.grad(lossB, modelA.parameters())
However, I get the following error: grad can be implicitly created only for scalar outputs

Any help would be great!

you need to make sure that lossB is a scalar (i.e. a single number) so make sure you sum over your batch dimension as well. If you want per-sample gradients, this can be done via the use of hooks (but gets quite messy to implement)

Also, if lossB doesn’t depend on the parameters of modelA the gradients will be zero by definition. So, check they depend on each other!

1 Like