Is torch.requires_grad required?

Debasmit_Das · October 26, 2020, 4:11am

I have been working on fine-tuning and freezing. Systematically, the method is to set torch.requires_grad = False for those parameters which are to be frozen. But is it required other than for computational reasons ? As a toy example if I want to freeze l1 and train l2 the following code should also work, right ? I don’t need to set torch.requires_grad = False, right ?

import torch

X = torch.rand(10,3)



l1 = torch.nn.Linear(3,2)
rel = torch.nn.ReLU()
l2 = torch.nn.Linear(2,1)



optim1 = torch.optim.Adam(l1.parameters(), lr = 0.1)
optim2 = torch.optim.Adam(l2.parameters(), lr = 0.1)

#print("Initial parameter: ", l2.weight)

for i in range(10):

  optim1.zero_grad()
  optim2.zero_grad()

  y=l2(rel(l1(X)))

  loss = torch.sum(y*y)

  loss.backward()

  optim2.step()

ptrblck · October 26, 2020, 8:07am

Yes, optim2 will only update the parameters of l2.
The gradients in the parameters of l1 will be computed, assigned, and replaced with zeros again.

Debasmit_Das · October 26, 2020, 4:29pm

So, no need to set torch.requires_grad = False for the parameters of l1, right ?

ptrblck · October 27, 2020, 4:39am

You don’t need to set this attribute if you are using the posted code snippet and l1 won’t be updated.
However, I would recommend to explicitly set it in case your code tries to use the gradients of the models in any way, which might be hard to debug later on.