I have been working on fine-tuning and freezing. Systematically, the method is to set torch.requires_grad = False for those parameters which are to be frozen. But is it required other than for computational reasons ? As a toy example if I want to freeze l1 and train l2 the following code should also work, right ? I don’t need to set torch.requires_grad = False, right ?
import torch X = torch.rand(10,3) l1 = torch.nn.Linear(3,2) rel = torch.nn.ReLU() l2 = torch.nn.Linear(2,1) optim1 = torch.optim.Adam(l1.parameters(), lr = 0.1) optim2 = torch.optim.Adam(l2.parameters(), lr = 0.1) #print("Initial parameter: ", l2.weight) for i in range(10): optim1.zero_grad() optim2.zero_grad() y=l2(rel(l1(X))) loss = torch.sum(y*y) loss.backward() optim2.step()