Intuition on parameter gradient for the backward() method?

Intel_Novel · February 27, 2019, 7:32pm

Let’s have this code:

import torch
x = torch.eye(1, 1, requires_grad=True)
y = 0.5*x+1
z = 3*y

z.backward(gradient=x)

print("gradients:")
print("x:",x.grad, "\ny:",y.grad, "\nz:",z.grad)

It is written that backward() method parameter gradient is:

gradient (Tensor or None): Gradient w.r.t. the
tensor. If it is a tensor, it will be automatically converted
to a Tensor that does not require grad unless create_graph is True.
None values can be specified for scalar Tensors or ones that
don’t require grad. If a None value would be acceptable then
this argument is optional.

What is the intuition to call this parameter gradient?
Usually we compute the gradient on scalar loss. loss.backward().

But what happens if we call the backward on a tensor that is not a scalar value?
Feedback is greatly appreciated.