I’m using a loss function that returns a tensor (separate loss for each element). I understand that I do have to set the loss.backwards(gradient=???) parameter. What I don’t understand is how to obtain said gradients. Following the answer to a previous question I set up the training step as follows. However, this leads to no useful training (which makes sense since gradients are always 1?)

For some context: this is the training step:

``````  def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = self.compute_loss(y_hat, y) # calls torchvision.ops.generalized_box_iou_loss()
self.log("train_loss", torch.mean(loss))
``````

Any ideas how to fix this?

Hi Jona!

Let me first ask the following question:

Normally we compute a loss function in order to train a network by
minimizing the loss function. (And we compute the gradient of the
loss so that we can minimize the loss with some version of gradient
descent.)

If your loss function results in a tensor, how do you propose to train
your network? Minimizing the loss for, say, element 1 will not, in
general, also minimize the loss for element 2.

What we generally do is minimize some weighted combination of
those per-element losses. But that weighted combination just becomes
our single scalar loss that we minimize by calling `loss.backward()`,
etc.

If you actually need to compute the gradient for each separate per-element
loss, then you need to run multiple backward passes in a loop.

You can use pytorch’s jacobian() function to run this loop for you, or you
can run it by hand.

`gradient = torch.ones_like (loss)` can be found in this post: