According to the docs, when we call the backward function to the tensor if the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying **gradient**.

```
import torch
a = torch.tensor([10.,10.],requires_grad=True)
b = torch.tensor([20.,20.],requires_grad=True)
F = a * b
F.backward(gradient=torch.tensor([1.,1.]))
print(a.grad)
```

`Output: tensor([20., 20.])`

Now scaling the external gradient:

```
a = torch.tensor([10.,10.],requires_grad=True)
b = torch.tensor([20.,20.],requires_grad=True)
F = a * b
F.backward(gradient=torch.tensor([2.,2.])) #modified
print(a.grad)
```

`Output: tensor([40., 40.])`

So, passing the gradient argument to backward seems to scale the gradients.

Also, by default `F.backward()`

is `F.backward(gradient=torch.Tensor([1.]))`

Apart from scaling the grad value how does the `gradient`

parameter passed to the backward function helps to compute the derivatives when we have a non-scalar tensor?

Why can’t PyTorch calculate the derivative implicitly without asking explicit gradient parameter as it did for the scalar-tensor?

Also, In the case of scalar value, `.backward()`

w/o parameters is equivalent to `.backward(torch.tensor(1.0))`

then why can’t we use broadcasting of `torch.tensor(1.0)`

so that it works for not-scalar inputs also instead of explicitly having to pass external `gradient`

parameter while calculating the vector Jacobian product?