Let’s have this code:

```
import torch
x = torch.eye(1, 1, requires_grad=True)
y = 0.5*x+1
z = 3*y
z.backward(gradient=x)
print("gradients:")
print("x:",x.grad, "\ny:",y.grad, "\nz:",z.grad)
```

It is written that `backward()`

method parameter `gradient`

is:

gradient (Tensor or None): Gradient w.r.t. the

tensor. If it is a tensor, it will be automatically converted

to a Tensor that does not require grad unless`create_graph`

is True.

None values can be specified for scalar Tensors or ones that

don’t require grad. If a None value would be acceptable then

this argument is optional.

What is the intuition to call this parameter `gradient`

?

Usually we compute the gradient on scalar loss. `loss.backward()`

.

But what happens if we call the backward on a tensor that is not a scalar value?

Feedback is greatly appreciated.