Hello,
I have my IOU loss function written and the model is always showing training and validation loss as nan. Please note that when i am switching over to cross entropy loss function then model training is working fine. So may me my loss function is something wrong? I have also printed loss value and average loss value in my loss function which are showing very less value…Can you all please help me on this? Thanks
Loss Function Code
class IntersectionOverUnion(nn.Module):
""" Implementation of the Soft-Dice Loss function. Arguments: num_classes (int): number of classes. eps (float): value of the floating point epsilon. """ def __init__(self, num_classes, eps=1e-5): super().__init__() # init class fields self.num_classes = num_classes self.eps = eps # define the forward pass def forward(self, preds, targets): # pylint: disable=unused-argument """ Compute Soft-Dice Loss. Arguments: preds (torch.FloatTensor): tensor of predicted labels. The shape of the tensor is (B, num_classes, H, W). targets (torch.LongTensor): tensor of ground-truth labels. The shape of the tensor is (B, 1, H, W). Returns: mean_loss (float32): mean loss by class value. """ loss = 0 # iterate over all classes for cls in range(self.num_classes): # get ground truth for the current class target = (targets == cls).float() # get prediction for the current class pred = preds[:, cls] # calculate intersection intersection = (pred * target).sum() # Will be zero if Truth=0 or Prediction=0 ## calculate union for the current class union = (pred + target).sum() # Will be zzero if both are 0 # compute dice coefficient # iou = (2 * intersection + self.eps) / (pred.sum() + target.sum() + self.eps) iou = (intersection + self.eps) / (union + self.eps) # We smooth our devision to avoid 0/0 print("IOU Value:",iou) # compute negative logarithm from the obtained loss = loss - iou.log() print("loss Value:",iou) # get mean loss by class value loss = loss / self.num_classes print("loss Avg Value:",iou) return loss
Model Training Result:
> epoch: 2, test_miou: 0.090242, train_loss: nan, test_loss: nan: 25%
> 3/12 [1:22:27<3:35:14, 1434.99s/it]
> [0/12][Train][261] Loss_avg: nan, Loss: nan, LR: 1e-05: 100%
> 262/262 [1:11:44<00:00, 16.43s/it]
> Streaming output truncated to the last 5000 lines.
> IOU Value: tensor(-0.0143487109, device='cuda:0', grad_fn=<DivBackward0>)
> loss Value: tensor(-0.0143487109, device='cuda:0', grad_fn=<DivBackward0>)
> IOU Value: tensor(0.0014381389, device='cuda:0', grad_fn=<DivBackward0>)
> loss Value: tensor(0.0014381389, device='cuda:0', grad_fn=<DivBackward0>)
> IOU Value: tensor(-0.0324460752, device='cuda:0', grad_fn=<DivBackward0>)
> loss Value: tensor(-0.0324460752, device='cuda:0', grad_fn=<DivBackward0>)
> IOU Value: tensor(0.0008592299, device='cuda:0', grad_fn=<DivBackward0>)
> loss Value: tensor(0.0008592299, device='cuda:0', grad_fn=<DivBackward0>)
> IOU Value: tensor(-3.3268113264e-10, device='cuda:0', grad_fn=<DivBackward0>)
> loss Value: tensor(-3.3268113264e-10, device='cuda:0', grad_fn=<DivBackward0>)
> IOU Value: tensor(-1.4262332426e-09, device='cuda:0', grad_fn=<DivBackward0>)
> loss Value: tensor(-1.4262332426e-09, device='cuda:0', grad_fn=<DivBackward0>)