How is default gradient parameter is passed to backward fuction?

In many blogpost and discussion section it is said that by default F.backward() is equivalent as F.backward(gradient=torch.Tensor([1.])).
But looking at the implementation of and autograd.backward() implementation the default value for external gradient is none i.e
def backward(self, gradient=None, retain_graph=None, create_graph=False):
def backward(
    tensors: _TensorOrTensors,
    grad_tensors: Optional[_TensorOrTensors] = None,

So, when is the gradient set to default torch.Tensor([1.]) ??
Also, what will be difference telling that:

1) F.backward() is equivalent as F.backward(gradient=torch.Tensor([1.]))
2) F.backward() is equal as F.backward(gradient=torch.Tensor([1.]))

These blogposts should really use torch.ones_like(loss) (but only for scalar-valued loss) even when ignoring the memory layout to keep things simple. torch.autograd.backward will call an auxiliary function
_make_grads that creates those if they are not passed in.
Having a None formal default argument and then somehow computing the “effective default” is a common pattern in Python much more generally applicable than just PyTorch. Sometimes - like here - it is because the default isn’t fixed (depends on the device, memory layout) or because the default is mutable and so you want to instantiate a new object every time the function is called (blog posts on “mutable default arguments” seem to be very common, too).

I’m guessing “equivalent” means “gives you the same result as” and it is a bit weaker than equal which is more “does literally the same thing”.

Best regards


1 Like

thanks, Thomas :D, Its clear now!