Set gradients explicitly

KFrank · May 9, 2022, 10:10pm

Hi Jona!

Let me first ask the following question:

Normally we compute a loss function in order to train a network by
minimizing the loss function. (And we compute the gradient of the
loss so that we can minimize the loss with some version of gradient
descent.)

If your loss function results in a tensor, how do you propose to train
your network? Minimizing the loss for, say, element 1 will not, in
general, also minimize the loss for element 2.

What we generally do is minimize some weighted combination of
those per-element losses. But that weighted combination just becomes
our single scalar loss that we minimize by calling loss.backward(),
etc.

If you actually need to compute the gradient for each separate per-element
loss, then you need to run multiple backward passes in a loop.

You can use pytorch’s jacobian() function to run this loop for you, or you
can run it by hand.

Some comments about what is going on when you use
gradient = torch.ones_like (loss) can be found in this post:

Best.

K. Frank