Two optimizers for two neural networks?

muhammadirfanzafar · February 3, 2022, 10:07pm

Hi,
I am using two fully connected NNs (net_1 and net_2). Input for second NN is based on the output of first NN. Loss is calculated using outputs of both NNs. Generic code for that purpose is given as:

net_1 = Net_u()
net_2 = Net_nu()
optimizer_u = Adam(net_1.parameters(), lr=0.001)
optimizer_nu = Adam(net_2.parameters(), lr=0.001)

optimizer_u.zero_grad()
optimizer_nu.zero_grad()

u = net_1(y)
gradients = torch.ones(u.shape[0], u.shape[1])
u_y = grad(u, y, grad_outputs=gradients, create_graph=True)[0]
nu = net_2(u_y)

loss = loss_fn(u, nu)
loss.backward()
optimizer_u.step()
optimizer_nu.step()

Can anybody kindly tell if this is being done right and two NNs are being learned without any bias or issue?
Is there any better way of doing it?

Thanks!

anantguptadbl · February 4, 2022, 10:01am

@muhammadirfanzafar

In your solution, your input into the second model is the gradient of the first

A simpler solution can be the following

class Net_u(nn.Module):
    def __init__(self):
        super(Net_u, self).__init__()
        self.layer = nn.Linear(4, 5)
    
    def forward(self, x):
        x= self.layer(x)
        return x
    
class Net_nu(nn.Module):
    def __init__(self):
        super(Net_nu, self).__init__()
        self.layer = nn.Linear(5, 1)
    
    def forward(self, x):
        x= self.layer(x)
        return x

net_1 = Net_u()
net_2 = Net_nu()
optimizer_u = optim.Adam(net_1.parameters(), lr=0.001)
optimizer_nu = optim.Adam(net_2.parameters(), lr=0.001)

optimizer_u.zero_grad()
optimizer_nu.zero_grad()

y = torch.rand(2, 4)
loss = torch.nn.MSELoss()

u = net_1(y)
nu = net_2(u)

loss = loss(u, nu)
loss.backward()
optimizer_u.step()
optimizer_nu.step()

muhammadirfanzafar · February 5, 2022, 10:41pm

My question was about use of optimizers. Defining two optimizers separately for each network is fine?