Trying to implement triplet loss. Getting extremely high values in between epochs

Hi,

I am trying to implement triplet loss for maximising distance between anchor features and negative features while trying to minimise distance between anchor features and positive features.

image

class TripletLoss(nn.Module):
    """
    Triplet loss
    Takes embeddings of an anchor sample, a positive sample and a negative sample
    """

    def __init__(self, margin = 1.0):
        super(TripletLoss, self).__init__()
        self.margin = margin

    def forward(self, anchor, positive, negative, size_average=True):
        distance_positive = (anchor - positive).pow(2).sum(1)  # .pow(.5)
        distance_negative = (anchor - negative).pow(2).sum(1)  # .pow(.5)
        losses = F.relu(distance_positive - distance_negative + self.margin)
        return losses.mean() if size_average else losses.sum()

My input feature embeddings in this case are the 2048-dimensional outputs of the last fully connected layer of ResNet152 (from torchvision).

The triplet loss converges in an expected manner during the beginning of training and it’s value lies between 1-200 for a batch size of 10 images. But sometimes during the epoch the value suddenly goes very high like of the order of 10^32 while it was just of the order of 10^0 - 10^1 the batch before in the same epoch.

Could this be due to some overflow issues or is there a problem with my implementation. Any inputs are most welcome. Thanks.

For example here is an example of epoch where this is happening :

loss_G: 163.7856 | loss_D: 0.0684 | loss_triplet: 137.4166 | loss_identity: 26.3Epoch 001/202 [0162/3521] -- Datasets/iris/vivek/Illumination/glass_off_medium/L-1706.bmp Datasets/iris/vivek/Expression/ex3/L-1213.bmp Datasets/iris/Mattew/Illumination/glasson_bright/L-1537.bmp vivek
loss_G: 162.8020 | loss_D: 0.0680 | loss_triplet: 136.5745 | loss_identity: 26.2Epoch 001/202 [0163/3521] -- Datasets/iris/Kong/Expression/ex2/V-1260.bmp Datasets/iris/Kong/Expression/ex3/L-1398.bmp Datasets/iris/Mattew/Illumination/glasson_bright/L-1537.bmp Kong
loss_G: 161.8320 | loss_D: 0.0676 | loss_triplet: 135.7396 | loss_identity: 26.0Epoch 001/202 [0164/3521] -- Datasets/iris/Vicky/Expression/ex3/V-1270.bmp Datasets/iris/Vicky/Expression/ex2/L-1215.bmp Datasets/iris/Shafik/Illumination/Off/L-758.bmp Vicky
loss_G: 160.8689 | loss_D: 0.0672 | loss_triplet: 134.9181 | loss_identity: 25.9Epoch 001/202 [0165/3521] -- Datasets/iris/hari/Illumination/Lon/L-1141.bmp Datasets/iris/hari/Expression/ex2/L-1144.bmp Datasets/iris/Shafik/Illumination/Off/L-758.bmp hari
loss_G: 159.9193 | loss_D: 0.0668 | loss_triplet: 134.1064 | loss_identity: 25.8Epoch 001/202 [0166/3521] -- Datasets/iris/Heo/Illumination/Off/L-593.bmp Datasets/iris/Heo/Expression/ex1/L-182.bmp Datasets/iris/vivek/Illumination/Off/L-1444.bmp Heo
loss_G: 158.9802 | loss_D: 0.0664 | loss_triplet: 133.3046 | loss_identity: 25.6Epoch 001/202 [0167/3521] -- Datasets/iris/Faysal/Expression/ex1/V-115.bmp Datasets/iris/Faysal/Expression/ex1/L-94.bmp Datasets/iris/vivek/Illumination/Off/L-1444.bmp Faysal
loss_G: 158.0535 | loss_D: 0.0660 | loss_triplet: 132.5124 | loss_identity: 25.5Epoch 001/202 [0168/3521] -- Datasets/iris/Michael/Illumination/2on/L-243.bmp Datasets/iris/Michael/Illumination/Ron/L-520.bmp Datasets/iris/bernard/Expression/ex1/L-491.bmp Michael
loss_G: 157.1604 | loss_D: 0.0656 | loss_triplet: 131.7237 | loss_identity: 25.4Epoch 001/202 [0169/3521] -- Datasets/iris/hari/Expression/ex3/L-1258.bmp Datasets/iris/hari/Illumination/Ron/L-1222.bmp Datasets/iris/David/Expression/ex1/L-1058.bmp hari
loss_G: 156.2522 | loss_D: 0.0652 | loss_triplet: 130.9501 | loss_identity: 25.3Epoch 001/202 [0170/3521] -- Datasets/iris/Mattew/Expression/ex2/L-1117.bmp Datasets/iris/Mattew/Expression/ex2/L-1143.bmp Datasets/iris/Sharon/Illumination/Off/L-1500.bmp Mattew
loss_G: 155.3542 | loss_D: 0.0649 | loss_triplet: 130.1857 | loss_identity: 25.1Epoch 001/202 [0171/3521] -- Datasets/iris/Uma/Illumination/Off/V-1540.bmp Datasets/iris/Uma/Expression/ex2/L-1133.bmp Datasets/iris/Shafik/Illumination/Off/L-758.bmp Uma
loss_G: 154.4762 | loss_D: 0.0645 | loss_triplet: 129.4302 | loss_identity: 25.0Epoch 001/202 [0172/3521] -- Datasets/iris/Koschan/Illumination/Off/V-1985.bmp Datasets/iris/Koschan/Expression/ex1/L-1637.bmp Datasets/iris/Balage/Expression/ex3/L-976.bmp Koschan
loss_G: 153.6104 | loss_D: 0.0641 | loss_triplet: 128.6870 | loss_identity: 24.9Epoch 001/202 [0173/3521] -- Datasets/iris/Michael/Illumination/glasson_bright/L-1084.bmp Datasets/iris/Michael/Expression/ex1/L-75.bmp Datasets/iris/Balage/Expression/ex3/L-976.bmp Michael
loss_G: 157.4832 | loss_D: 0.0637 | loss_triplet: 132.0913 | loss_identity: 25.3Epoch 001/202 [0174/3521] -- Datasets/iris/Heo/Expression/off_ex3/L-526.bmp Datasets/iris/Heo/Illumination/gassoff_bright/L-978.bmp Datasets/iris/bernard/Expression/ex1/L-491.bmp Heo
loss_G: 156.5927 | loss_D: 0.0634 | loss_triplet: 131.3321 | loss_identity: 25.2Epoch 001/202 [0175/3521] -- Datasets/iris/bernard/Expression/ex2/L-566.bmp Datasets/iris/bernard/Illumination/Ron/L-228.bmp Datasets/iris/David/Expression/ex1/L-1058.bmp bernard
loss_G: 176.1949 | loss_D: 0.0630 | loss_triplet: 130.5817 | loss_identity: 45.6Epoch 001/202 [0176/3521] -- Datasets/iris/Sharon/Expression/ex1/L-1131.bmp Datasets/iris/Sharon/Expression/ex2/L-1058.bmp Datasets/iris/Koschan/Illumination/Off/L-1987.bmp Sharon
loss_G: 175.2179 | loss_D: 0.0627 | loss_triplet: 129.8452 | loss_identity: 45.3Epoch 001/202 [0177/3521] -- Datasets/iris/Michael/Illumination/glasson_medium/V-1245.bmp Datasets/iris/Michael/Illumination/glasson_bright/L-1091.bmp Datasets/iris/Balage/Expression/ex2/L-908.bmp Michael
loss_G: 177.1110 | loss_D: 0.0623 | loss_triplet: 129.1116 | loss_identity: 47.9Epoch 001/202 [0178/3521] -- Datasets/iris/David/Illumination/2on/V-1119.bmp Datasets/iris/David/Expression/ex1/L-1042.bmp Datasets/iris/Balage/Expression/ex3/L-976.bmp David
loss_G: 176.1414 | loss_D: 0.0620 | loss_triplet: 128.3917 | loss_identity: 47.7Epoch 001/202 [0179/3521] -- Datasets/iris/Nash/Illumination/Ron/L-360.bmp Datasets/iris/Nash/Expression/ex1/L-53.bmp Datasets/iris/Balage/Expression/ex3/L-976.bmp Nash
loss_G: 175.1878 | loss_D: 0.0616 | loss_triplet: 127.6771 | loss_identity: 47.5Epoch 001/202 [0180/3521] -- Datasets/iris/bernard/Expression/ex3/V-671.bmp Datasets/iris/bernard/Illumination/2on/L-72.bmp Datasets/iris/Mattew/Illumination/2on/L-1042.bmp bernard
loss_G: 174.4560 | loss_D: 0.0613 | loss_triplet: 126.9677 | loss_identity: 47.4Epoch 001/202 [0181/3521] -- Datasets/iris/Heo/Expression/off_ex2/V-486.bmp Datasets/iris/Heo/Illumination/gassoff_bright/L-969.bmp Datasets/iris/bernard/Illumination/2on/L-72.bmp Heo
loss_G: 173.5150 | loss_D: 0.0610 | loss_triplet: 126.2718 | loss_identity: 47.2Epoch 001/202 [0182/3521] -- Datasets/iris/brad/Expression/ex3/V-268.bmp Datasets/iris/brad/Expression/ex1/L-59.bmp Datasets/iris/Balage/Expression/ex2/L-908.bmp brad
loss_G: 172.5616 | loss_D: 0.0606 | loss_triplet: 125.5780 | loss_identity: 46.9Epoch 001/202 [0183/3521] -- Datasets/iris/brad/Expression/ex2/L-191.bmp Datasets/iris/brad/Expression/ex1/L-59.bmp Datasets/iris/Balage/Expression/ex2/L-908.bmp brad
loss_G: 76143.6405 | loss_D: 0.0603 | loss_triplet: 124.8918 | loss_identity: 76Epoch 001/202 [0184/3521] -- Datasets/iris/Mattew/Illumination/Ron/L-1200.bmp Datasets/iris/Mattew/Expression/ex1/L-1066.bmp Datasets/iris/Balage/Illumination/Lon/L-351.bmp Mattew
loss_G: 90587.5310 | loss_D: 0.0600 | loss_triplet: 124.2130 | loss_identity: 90Epoch 001/202 [0185/3521] -- Datasets/iris/Vijay/Illumination/2on/L-1068.bmp Datasets/iris/Vijay/Expression/ex2/L-1247.bmp Datasets/iris/David/Expression/ex1/L-1047.bmp Vijay
loss_G: 90097.8687 | loss_D: 0.0597 | loss_triplet: 123.5416 | loss_identity: 89Epoch 001/202 [0186/3521] -- Datasets/iris/Sharon/Expression/ex3/V-1416.bmp Datasets/iris/Sharon/Illumination/Off/L-1493.bmp Datasets/iris/bernard/Expression/ex2/L-559.bmp Sharon
loss_G: 35516960500023279616.0000 | loss_D: 0.0593 | loss_triplet: 35516960500023189504.0000 | loss_identity: 3255110397.9490 -- ETA: 2 days, 0:12:10.699602--TiEpoch 001/202 [0187/3521] -- Datasets/iris/hari/Expression/ex1/L-1039.bmp Datasets/iris/hari/Illumination/Off/L-1305.bmp Datasets/iris/David/Expression/ex2/L-1121.bmp hari
loss_G: 35327030777277411328.0000 | loss_D: 0.0590 | loss_triplet: 35327030777273597952.0000 | loss_identity: 3241421788.4199 -- ETA: 2 days, 0:08:32.830498--TiEpoch 001/202 [0188/3521] -- Datasets/iris/Balage/Expression/ex2/L-948.bmp Datasets/iris/Balage/Illumination/2on/L-105.bmp Datasets/iris/Faysal/Illumination/Ron/L-285.bmp Balage

This is because of model’s weights go extremely high.
Do you use weight_decay parameter on optimizer?
If not, please try it.