I have a network N1 with some layers and a network N2. They have some layers that are different. They’re trained on different data. I want to get gradients for the common layers, say, l1, l2, l5 and add them and continue the training process.
I was thinking of getting gradients from for name, parameter in model.named_parameters(): and filtering by name and adding them and calling optimizer.step(). But, the model.named_parameters is read only. How do I access the gradients and perform operations on them?
When you say “common” do you mean that they should be the same (always keep the same parameters values)? Or just that gradients from one should be added to the gradients of the other?
Note that it is also possible to just share the same module.
class A(nn.Module):
def __init__(self):
super().__init__()
self.lin = nn.Linear(3, 3)
def forward(self, x):
return self.lin(x)
class B(nn.Module):
def __init__(self):
super().__init__(lin)
self.lin = lin
def forward(self, x):
return self.lin(x)
net1 = A()
net2 = B(net1.lin)
optimizer1 = torch.optim.SGD(net1.parameters(), lr=0.01)
# No need for a second optimizer as all the parameters are already in net1.
for i in range(2):
input = torch.randn(3, 3)
loss = (net1(input) + net2(input)).sum()
loss.backward()
optimizer1.step()
If you have only part of the model that is shared, you can use a trick like this for B:
class B(nn.Module):
def __init__(self):
super().__init__(lin)
self.lin = [lin,] # Puting this in a python list hides it from .parameters()
self.lin2 = nn.Linear(3, 3)
def forward(self, x):
return self.lin2(self.lin(x))
net2 = B(net1.lin)
optimizer2 = torch.optim.SGD(net2.parameters(), lr=0.01)
# Note that the parameters of net2.lin won't be in parameters because it is in a list !