Question about require_grad


Suppose I have a CNN that has 3 layers: conv1 - conv2 - fc1.

If I configure the parameters in conv2 with require_grad = False. Then the gradient for conv2 will only be calculated but not updated, right? and will the parameters in conv1 updated properly?


If you set requires_grad=False for conv2, then conv2.grad = None. But you can still compute and update the gradient of conv1. A simpler example with 3 variables:

x = Variable(torch.rand(5), requires_grad=True)
y = Variable(torch.rand(5), requires_grad=True)
z = Variable(torch.rand(5), requires_grad=False)

a = x*y # conv1
b = a*z # conv2
c = torch.sum(b) # fc1

print(torch.sum(y.grad - x*z)**2)

Variable containing:
[torch.FloatTensor of size 1]

1 Like

Oh, I thought the gradient for conv2 will only be computed for the chain rule but the parameters will not be updated. Thanks!