So var I never called grad function directly. Is there any reason when this is needed.

I learned that gradients will be calculated in the backward pass automatically. Is this the function that PyTorch is calling when backward(). Doesn’t seem to be the case.

This function allows you to get finer control of what the autograd computes even after the forward pass. .backward() will populate all the .grad fields of leaf variables that require gradients. autograd.grad() will compute and return the gradients of a set of specified Tensors.

You use grad for example to compute gradient to then compute gradient penalties.

OK, leaves act as accumulators. On each .backward()
For example:

x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)
y = w * x + b # y = 2 * x + 3
y.backward(retain_graph=True)
y.backward()
x.backward()
x.backward()
b.backward()
b.backward()
b.backward()
w.backward()
print("x grad sum:", x.grad)
print("w grad sum:", w.grad)
print("b grad sum:", b.grad)

So you can emulate autograd.grad() with multiple backward() calls, Right?

.grad() is that basic function that computes gradients. .backward() has a side effect of changing the .grad attributes and is convenient when working with neural nets.

You can replace one by the other “easily” but it won’t be so convenient.
To replace grad by backward:

Save all the .grad attributes on all leafs

Reset the required variables .grad field to 0

call backward (this might do more work than needed)

extract from the variables .grad field their gradients

restore the original .grad attributes

To replace backward by grad:

call grad with all the leafs (model.parameters() for example)

accumulate all the gradients you got in the .grad field of the corresponding variable

I fond this code may be interesting as it shows how grad() function may be used to compute sum of gradients “between two tensors”.

x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)
y = w * x + b # y = 2 * x + 3
gd = torch.autograd.grad(outputs=y, inputs=x)
print(gd) #(tensor(2.),)
gd = torch.autograd.grad(outputs=y, inputs=b)
print(gd) #(tensor(1.),)