Loss for outputs of each layer


My problem is this:

I have data X and Y and I know its latent representation T. T is between [-1,1]. So, I am trying to create two parallel auto-encoders (Q1, P1) and (Q2, P2) [#both have a single hidden layer] with the following objectives
(1) Q1, Q2 = encoder
(2) P1, P2 = decoder
(3) My objective is to minimize the latent representation w.r.t T and also the reconstructed data w.r.t to X and Y
(4) I would also like to minimize the difference between the first hidden layer outputs using m.s.e.

Few snippets of code are given below (for defining Q1,P1). Q2,P2 are also defined similarly.

class Q1(nn.Module):  
  def __init__(self):
    super(Q1, self).__init__()
    # separate layers
    self.headX = nn.Linear(D_in1, H)                
    self.tailX = nn.Linear(H, D_out)

  def forward(self, x):        
    x = self.headX(x)        
    x1 =F.relu(x)                
    return F.tanh(self.tailX(x1)), x1

class P1(nn.Module):  
  def __init__(self):
    super(P_1, self).__init__()
    self.headX = nn.Linear(D_out, H)                
    self.tailX = nn.Linear(H, D_in1)
  def forward(self, x):        
    x = self.headX(x)        
    x1 = F.relu(x)                
    return F.tanh(self.tailX(x1)), x1    

Now when I am trying to compute the loss for objective (3) I am doing this (which is working fine)

for batch_idx, (data1, y_act1, data2, y_act2) in enumerate(train_loader):
    data1, y_act1 = Variable(data1), Variable(y_act1)
    data2, y_act2 = Variable(data2), Variable(y_act2)
    y_pred1, tx1 =  Q1(data1)
    y_pred2, ty1 =  Q2(data2)
    x_pred1, tx2 =  P1(y_pred1)
    x_pred2, ty2 =  P2(y_pred2)
    # Compute and print loss.
    # embedding loss
    loss_em1 = Loss_embed1(y_pred1, y_act1)        
    loss_em2 = Loss_embed2(y_pred2, y_act2)
    # reconstruction loss
    loss_re1 = Loss_recons1(x_pred1, data1)
    loss_re2 = Loss_recons2(x_pred2, data2)
    #compute loss for output of common layers but employed on intermediate outputs   
    # in the encoding side
    q1 = torch.mean(torch.norm(tx1 - ty1, 2, 1))      
    # in the decoding side
    q2 = torch.mean(torch.norm(tx2 - ty2, 2, 1))
    # Total loss
    loss_a = loss_re1 + loss_re2
    loss_b = loss_em1 + loss_em2
    loss = loss_a + loss_b + q1 + q2
    print('Epoch {:d}: {:d} reconst Loss {:.6f} embed Loss {:.6f} enc Loss {:.6f} dec Loss {:.6f}\n'.format(
        epoch, batch_idx, loss_a.data[0], loss_b.data[0], q1.data[0], q2.data[0]))
    # set the gradients to zero
    # compute by backpropagation
    # update the weights

Is this the most appropriate way to do it? Any information on this will be helpful. Thanks.

1 Like

[bystander comment] Not answering your question, but seems like the constructor of P and Q take zero parameters, whereas you are passing in data1 etc as parameters. Does this compile/run ok for you?