Calculate the "backward" for only one Tensor

barakb · December 3, 2019, 10:48am

Hi, I want to make a backward pass that will calculate gradient update need just for one specific tensor.
I understand that the tensors that are closer to the end of the network will also get a gradient calculation, but my goal is just for the specific tensor.
So how can I stop the loss.backword() operation in some layer? The main reason btw is to save time.

Thanks.

tom · December 3, 2019, 11:41am

You can set parameters not needing gradients to not require grad (using p.requires_grad_(False)). PyTorch only computes gradients required for those values that require gradients.
Even so, backpropagation has it that you use the chain rule from the final result (i.e. loss) back to the first use of the parameter that requires gradients. You cannot escape calculating gradients for the intermediates in between.

Best regards

Thomas

barakb · December 3, 2019, 11:56am

As said I’m not trying to escape calculating the layers between the desire tensor and output, but on earlier layers.
One thing that I’m never sure about, does requires_grad_(False) block the gradient from moving to earlier layers?
lets say I want the gradient just for conv3:

x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
out = self.conv4(x)

Should this be enough:

self.conv2.requires_grad_(False))

or this is a must:

self.conv1.requires_grad_(False))
self.conv2.requires_grad_(False))

Thanks.

tom · December 3, 2019, 3:40pm

The latter. If you picture the autograd tree, self.conv2.requires_grad_(False) only cuts the bits from the “stem” to the leaves in self.conv2 (i.e. weight/bias).

barakb · December 4, 2019, 3:59pm

Thanks, Is it possible to do the following:

                model.module.layer2.requires_grad_(False)
                model.module.layer1.requires_grad_(False)
                out = model(input_var,first=1)
                loss_1 = criterion(out, t)
                optimizer.zero_grad()
                loss_1.backward(retain_graph=True)

And then:

                model.module.layer2.requires_grad_(True)
                model.module.layer1.requires_grad_(True)
                loss_2 = criterion(out, t)
                optimizer.zero_grad()
                loss_2.backward()

Means, does the .requires_grad_(True) option need to be submitted in the forward pass of the network, or it can be also before the forward pass

tom · December 4, 2019, 6:32pm

You need it before the forward pass.
So technically, it’s not a Module but a Parameter/Tensor property and you need to decide before you use the Tensor whether you want to differentiate later so that the autograd can start recording.

Best regards

Thomas