I want to build a multi task learning model on two related datasets with different inputs and targets. The two tasks are sharing lower-level layers but with different header layers, a minimal example:
class MultiMLP(nn.Module):
"""
A simple dense network for MTL on hard parameter sharing.
"""
def __init__(self):
super().__init__()
self.hidden = nn.Linear(100, 200)
self.out_task0= nn.Linear(200, 1)
self.out_task0= nn.Linear(200, 1)
def forward(self, x):
x = self.hidden(x)
x = F.relu(x)
y_task0 = self.out_task0(x)
y_task1 = self.out_task1(x)
return [y_task0, y_task1]
The dataloader is constructed so that the batches are alternatively generated from two datasets, i.e. batch 0, 2, 4, … from task 0, batch 1, 3, 5, … from task 1. I wanted to train the network in this way: only update weights for hidden
layer and out_task0
for batches from task 0, and update only hidden
and out_task1
for task 1.
I then alternatively switch requires_grad
for the corresponding tasks during training as following. But I observed that all weights are updated for every iteration.
criterion = MSELoss()
for i, data in enumerate(combined_loader):
x, y = data[0], data[1]
optimizer.zero_grad()
# controller is 0 for task0, 1 for task1
# altenate the header layer
controller = i % 2
task0_mode = True if controller == 0 else False
for name, param in model.named_parameters():
if name in ['out_task0.weight', 'out_task0.bias']:
param.requires_grad = task0_mode
elif name in ['out_task1.weight', 'out_task1.bias']:
param.requires_grad = not task0_mode
outputs = model(x)[controller]
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
# Monitor the parameter updates
for name, p in model.named_parameters():
if name in ['out_task0.weight', 'out_task1.weight']:
print(f"Controller: {controller}")
print(name, p)
Did I miss anything in the training procedure? Or the overall setup will not work? Thanks a lot!