My model has 3 sub-task lossess. I want to calculate grad through each loss.backward() in 1 forward, and then use optimizer.step() for different parts of the model. The model is defined as below:
Class SomeNet(nn.Module):
def __init__(self, num_classes):
self.encoder = resnet50(pretrained=True)
self.cls1 = nn.Linear(2048, num_classes)
self.cls2 = nn.Linear(2048, num_classes)
def forward(x):
feature = self.encoder(x)
output1 = self.cls1(feature)
output2 = self.cls2(feature)
return output1, output2
Currently I am using optimizer.zero_grad() before calling each loss.backward() to ensure only a specific part is updated. Concretly, I update encoder and cls1 with loss1, cls2 with loss2 and encoder with loss3. See demo code below:
model = SomeNet(num_classes)
optimizer_fea = torch.optim.SGD({'params': model.encoder.parameters()})
optimizer_cls1 = torch.optim.SGD({'params': model.cls1.parameters()})
optimizer_cls2= torch.optim.SGD({'params': model.cls2.parameters()})
for data, label1, label2, label3 in data_loader:
output1, output2 = model(data)
loss1 = criterion1(output1, label1)
loss2 = criterion2(output2, label2)
loss3 = criterion2(output2, label3)
loss1.backward(retain_graph=True)
optimizer_fea.step()
optimizer_cls1.step()
optimizer_fea.zero_grad()
optimizer_cls1.zero_grad()
loss2.backward(retain_graph=True)
optimizer_cls2.step()
optimizer_fea.zero_grad()
optimizer_cls2.zero_grad()
loss3.backward()
optimizer_fea.step()
optimizer_fea.zero_grad()
optimizer_cls2.zero_grad()
I have 2 questions for above code:
-
If I use loss3.backward without retain_graph=True, then most of computational graph is freed. However, loss3 does not involve the node of cls1 in computational graph, so I guess there is some graph is not freed. Is there any convenient way to free all computational graph after I’ve called loss3.backward()?
-
The above code is tediously long. Is there any convenient way to only compute gradient for a specific layer of model?