I have 2 dense linear net (model1 and model2) with 30 neurons as the output for each. With the following forward function the loss does not reduce but when I change it to single net with single output neuron and hardtanh activation then the loss starts decreasing. Is there anything wrong with my forward function?
def forward(self, x1, x2):
tens1 = self.model1(x1)
tens2 = self.model2(x2)
distance = torch.sqrt(torch.sum(( tens1 - tens2) ** 2, 1))
return nn.Hardtanh(0.001, 1)(distance)
Does the way I have defined my forward function make any issue with the gradient back propagation ?
Thanks
P.S. I could narrow my problem down to the fact that it as something to do with the numerical calculations because I get the same issue with
distance = torch.sqrt(torch.sum(tens1 ** 2, 1))
return nn.Hardtanh(0.001, 1)(distance)
or distance = torch.sqrt(torch.sum(torch.abs(tens1), 1))
With the power operation I could get it to work with modifying learning rate, but with abs function no matter what rate I choose it doesn’t decrease!
Can anyone tell me what I am doing wrong please?
I think I figured it out myself. It is all about the range of values and learning rate. Playing with those parameters and ranges will fix the issue.