I have been working on fine-tuning and freezing. Systematically, the method is to set torch.requires_grad = False for those parameters which are to be frozen. But is it required other than for computational reasons ? As a toy example if I want to freeze l1 and train l2 the following code should also work, right ? I don’t need to set torch.requires_grad = False, right ?
import torch
X = torch.rand(10,3)
l1 = torch.nn.Linear(3,2)
rel = torch.nn.ReLU()
l2 = torch.nn.Linear(2,1)
optim1 = torch.optim.Adam(l1.parameters(), lr = 0.1)
optim2 = torch.optim.Adam(l2.parameters(), lr = 0.1)
#print("Initial parameter: ", l2.weight)
for i in range(10):
optim1.zero_grad()
optim2.zero_grad()
y=l2(rel(l1(X)))
loss = torch.sum(y*y)
loss.backward()
optim2.step()