Is there any reason why we need to call `grad` function directly

Intel_Novel · March 18, 2019, 11:40am

So var I never called grad function directly. Is there any reason when this is needed.

I learned that gradients will be calculated in the backward pass automatically. Is this the function that PyTorch is calling when backward(). Doesn’t seem to be the case.

albanD · March 18, 2019, 11:45am

Hi,

This function allows you to get finer control of what the autograd computes even after the forward pass.
.backward() will populate all the .grad fields of leaf variables that require gradients.
autograd.grad() will compute and return the gradients of a set of specified Tensors.

You use grad for example to compute gradient to then compute gradient penalties.

Intel_Novel · March 18, 2019, 12:09pm

OK, leaves act as accumulators. On each .backward()
For example:

x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w * x + b    # y = 2 * x + 3

y.backward(retain_graph=True)
y.backward()
x.backward()
x.backward()
b.backward()
b.backward()
b.backward()
w.backward()
print("x grad sum:", x.grad)
print("w grad sum:", w.grad)
print("b grad sum:", b.grad)

So you can emulate autograd.grad() with multiple backward() calls, Right?

albanD · March 18, 2019, 12:19pm

.grad() is that basic function that computes gradients.
.backward() has a side effect of changing the .grad attributes and is convenient when working with neural nets.

You can replace one by the other “easily” but it won’t be so convenient.
To replace grad by backward:

Save all the .grad attributes on all leafs
Reset the required variables .grad field to 0
call backward (this might do more work than needed)
extract from the variables .grad field their gradients
restore the original .grad attributes

To replace backward by grad:

call grad with all the leafs (model.parameters() for example)
accumulate all the gradients you got in the .grad field of the corresponding variable

Intel_Novel · March 19, 2019, 8:40am

I fond this code may be interesting as it shows how grad() function may be used to compute sum of gradients “between two tensors”.

x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w * x + b    # y = 2 * x + 3

gd = torch.autograd.grad(outputs=y, inputs=x)
print(gd) #(tensor(2.),)
gd = torch.autograd.grad(outputs=y, inputs=b)
print(gd) #(tensor(1.),)