[SOLVED] No training progress when using my own L1Loss function

alishir · January 6, 2018, 8:44am

I just implemented simple L1 loss function in order to implement more complex loss functions later. I checked the result of my own L1 loss function with nn.L1Loss the results are similar. But unfortunately when I use my own L1 loss to train a simple network, there is no progress in training while using nn.L1Loss works fine.

Here is the simple L1Loss implementation:

from torch.nn import Module
import torch

class MyLoss(Module):
    def forward(self, x, y):
        """
        x,y: both have (batch_size, input_size) dimension
        """
        d = y - x
        d *= 1 / y.size()[1]
        d = d.abs()
       return torch.sum(d) / y.size()[0]

Here is the testcase:

from torch.autograd import Variable
import torch
from torch import nn
from loss import MyLoss
import unittest

class TestLoss(unittest.TestCase):
    @classmethod
    def setUpClass(self):
        self.criterion = MyLoss()
        self.l1loss = nn.L1Loss()

    def test_loss(self):
        y = Variable(torch.Tensor([[2, 1, 0, -1, -2], [2, 1, 0, -1, -2]]))
        x = Variable(torch.Tensor([[1, 1, 1, 1, 1], [5, 4, 1, 1, 1]]))
        self.assertEqual(self.l1loss(x,y).data[0],
                        self.criterion(x,y).data[0])

if __name__ == '__main__':
    unittest.main()

nikostr · January 6, 2018, 9:17am

Have you implemented an init for your loss?

alishir · January 6, 2018, 9:43am

Thanks for your reply.

No I didn’t implement the __init__ method, I just add __init__ method but it doesn’t work yet.

class MyLoss(Module):
    def __init__(self):
        super(MyLoss, self).__init__()

    def forward(self, x, y):
        d = y - x
        d *= 1 / y.size()[1]
        d = d.abs()
        return torch.sum(d) / y.size()[0]

alishir · January 6, 2018, 1:01pm

It seems I found the problem, gradients vanish after few training iteration when I use my own L1Loss. But when using nn.L1Loss there is no problem.

nikostr · January 6, 2018, 1:21pm

I’m not entirely sure what could be causing this. It seems that the L1-loss implemented by pytorch uses torch.mean rather than division by size. Have a look at the functions _pointwise_loss and l1_loss here:

github.com

pytorch/pytorch/blob/408c84de7c4d852beffb9c64e2351b16cfff8a1f/torch/nn/functional.py#L1385


    loss = loss * weight


if not reduce:
    return loss
elif size_average:
    return loss.mean()
else:
    return loss.sum()




def _pointwise_loss(lambd, lambd_optimized, input, target, size_average=True, reduce=True):
if target.requires_grad:
    d = lambd(input, target)
    if not reduce:
        return d
    return torch.mean(d) if size_average else torch.sum(d)
else:
    return lambd_optimized(input, target, size_average, reduce)




smooth_l1_loss = _add_docstr(torch._C._nn.smooth_l1_loss, r"""

alishir · January 6, 2018, 1:29pm

Thanks,

Finally I use pairwise_distance instead of minus operator, and now the code is work fine.
Here is the working code:

class CrossOverLoss(nn.Module):
    def __init__(self):
        super(CrossOverLoss, self).__init__()

    def forward(self, x, y):
        batch_size, n = y.size()
        x = x.squeeze()
        d = pairwise_distance(x, y)      
        return torch.sum(d) / batch_size