Hello All,
I am trying to implement develop a model to generate masks on the Kvasir-SEG Dataset(The Kvasir-SEG Dataset) and using a mixture of focal and Dice loss which is as follows -
class DiceFocalLoss(nn.Module):
def __init__(self, weight=None, size_average=True, alpha=0.25, gamma=3, smooth=1):
super(DiceFocalLoss, self).__init__()
self.gamma = gamma
self.alpha = alpha
self.smooth = smooth
def forward(self, inputs, targets):
# comment out if your model contains a sigmoid or equivalent activation layer
#inputs = F.sigmoid(inputs)
# flatten label and prediction tensors
inputs = inputs.view(-1)
targets = targets.view(-1)
intersection = (inputs * targets).sum()
dice_loss = 1 - (2. * intersection + self.smooth) / (inputs.sum() + targets.sum() + self.smooth)
BCE = F.binary_cross_entropy_with_logits(inputs, targets, reduction='mean')
BCE_EXP = torch.exp(-BCE)
focal_loss = self.alpha * (1-BCE_EXP)**self.gamma * BCE
Dice_BCE = focal_loss + dice_loss
return Dice_BCE
and while debugging the model, I take a single (image, mask) pair from my dataset and train only on it but surprisingly while using autocast the loss change but rather remains constant indicating the model weights are not being updated. I am able to recreate the issue with a simple 2 layer convolution model as well -
the toy model is defined as -
model = nn.Sequential(nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, stride=1, padding=1),
nn.Conv2d(in_channels=3, out_channels=1, kernel_size=1, stride=1, padding=0))
model = model.cuda().train()
and my training loop is -
optim = torch.optim.SGD(model.parameters(), lr=1e-5)
for i in range(10):
with autocast():
out = model(image)
l = loss(out, mask)
scaler.scale(l).backward()
scaler.step(optim)
scaler.update()
print(l)
loss values without autocast -
tensor(2.6734, device='cuda:0', grad_fn=<AddBackward0>)
tensor(2.0208, device='cuda:0', grad_fn=<AddBackward0>)
tensor(1.3117, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.8833, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.7073, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6678, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.6844, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.7187, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.7563, device='cuda:0', grad_fn=<AddBackward0>)
tensor(0.7941, device='cuda:0', grad_fn=<AddBackward0>)
Loss values with autocast -
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
tensor(8.9766, device='cuda:0', grad_fn=<AddBackward0>)
As you can see, there is no change no matter what which implies the model weights are not getting updated.
Could anyone please point out what is going wrong ?
TIA