Hi,
I am using two fully connected NNs (net_1 and net_2). Input for second NN is based on the output of first NN. Loss is calculated using outputs of both NNs. Generic code for that purpose is given as:
net_1 = Net_u()
net_2 = Net_nu()
optimizer_u = Adam(net_1.parameters(), lr=0.001)
optimizer_nu = Adam(net_2.parameters(), lr=0.001)
optimizer_u.zero_grad()
optimizer_nu.zero_grad()
u = net_1(y)
gradients = torch.ones(u.shape[0], u.shape[1])
u_y = grad(u, y, grad_outputs=gradients, create_graph=True)[0]
nu = net_2(u_y)
loss = loss_fn(u, nu)
loss.backward()
optimizer_u.step()
optimizer_nu.step()
Can anybody kindly tell if this is being done right and two NNs are being learned without any bias or issue?
Is there any better way of doing it?
Thanks!
@muhammadirfanzafar
In your solution, your input into the second model is the gradient of the first
A simpler solution can be the following
class Net_u(nn.Module):
def __init__(self):
super(Net_u, self).__init__()
self.layer = nn.Linear(4, 5)
def forward(self, x):
x= self.layer(x)
return x
class Net_nu(nn.Module):
def __init__(self):
super(Net_nu, self).__init__()
self.layer = nn.Linear(5, 1)
def forward(self, x):
x= self.layer(x)
return x
net_1 = Net_u()
net_2 = Net_nu()
optimizer_u = optim.Adam(net_1.parameters(), lr=0.001)
optimizer_nu = optim.Adam(net_2.parameters(), lr=0.001)
optimizer_u.zero_grad()
optimizer_nu.zero_grad()
y = torch.rand(2, 4)
loss = torch.nn.MSELoss()
u = net_1(y)
nu = net_2(u)
loss = loss(u, nu)
loss.backward()
optimizer_u.step()
optimizer_nu.step()
My question was about use of optimizers. Defining two optimizers separately for each network is fine?