Using tensor operations in forward function

uhotspot4 · June 6, 2018, 5:39pm

I have 2 dense linear net (model1 and model2) with 30 neurons as the output for each. With the following forward function the loss does not reduce but when I change it to single net with single output neuron and hardtanh activation then the loss starts decreasing. Is there anything wrong with my forward function?

def forward(self, x1, x2):
    tens1 = self.model1(x1)
    tens2 = self.model2(x2)
    distance = torch.sqrt(torch.sum(( tens1 - tens2) ** 2, 1))
    return nn.Hardtanh(0.001, 1)(distance)

Does the way I have defined my forward function make any issue with the gradient back propagation ?
Thanks

P.S. I could narrow my problem down to the fact that it as something to do with the numerical calculations because I get the same issue with

distance = torch.sqrt(torch.sum(tens1 ** 2, 1))
return nn.Hardtanh(0.001, 1)(distance)

or distance = torch.sqrt(torch.sum(torch.abs(tens1), 1))

With the power operation I could get it to work with modifying learning rate, but with abs function no matter what rate I choose it doesn’t decrease!
Can anyone tell me what I am doing wrong please?

I think I figured it out myself. It is all about the range of values and learning rate. Playing with those parameters and ranges will fix the issue.

albanD · June 7, 2018, 12:26pm

Also be careful when using sqrt for numbers that can be 0: sqrt will return nan gradients at 0. You might want to add an epsilon to avoid such issue if your model ever converge to a place where the distance is 0.