epoch: 125/200, subject: 1/2, batch: 26/32, avg-batch-loss: 0.2313, avg-batch-dice: 0.6728
epoch: 125/200, subject: 1/2, batch: 27/32, avg-batch-loss: 0.2289, avg-batch-dice: 0.6704
epoch: 125/200, subject: 1/2, batch: 28/32, avg-batch-loss: 0.2233, avg-batch-dice: 0.6989
epoch: 125/200, subject: 1/2, batch: 29/32, avg-batch-loss: 0.2275, avg-batch-dice: 0.6986
epoch: 125/200, subject: 1/2, batch: 30/32, avg-batch-loss: 0.2232, avg-batch-dice: 0.7028
epoch: 125/200, subject: 1/2, batch: 31/32, avg-batch-loss: 0.2241, avg-batch-dice: 0.7013
epoch: 125/200, subject: 1/2, batch: 32/32, avg-batch-loss: 0.2176, avg-batch-dice: 0.7159
Criteria at the end of epoch 125 subject 1 is 0.7159
Criteria increased from 0.6941 to 0.7159, saving model ...
epoch: 125/200, subject: 2/2, batch: 1/32, avg-batch-loss: nan, avg-batch-dice: nan
epoch: 125/200, subject: 2/2, batch: 2/32, avg-batch-loss: nan, avg-batch-dice: nan
epoch: 125/200, subject: 2/2, batch: 3/32, avg-batch-loss: nan, avg-batch-dice: nan
I am using two combined losses:
def focal_dice_loss(y_pred, y_true, delta = 0.7, gamma_fd=0.75, epsilon = 1e-6):
axis = identify_axis(y_pred.shape) # [2,3,4]
ones = torch.ones_like(y_pred)
p_c = y_pred # proba that voxels are class i
p_n = ones-y_pred
g_t = y_true #.type(torch.FloatTensor) #cuda.FloatTensor)
g_n = ones-g_t
tp = torch.sum(torch.sum(p_c*g_t, axis), 0)
fp = torch.sum(torch.sum(p_c*g_n, axis), 0)
fn = torch.sum(torch.sum(p_n*g_t, axis), 0)
tversky_dice = (tp+epsilon)/(tp + delta*fn + (1-delta)*fp + epsilon) #torch.Size([9])
focal_dice_loss_fg = torch.pow((1-tversky_dice), gamma_fd)[1:] # removing 0 --> background
dice_loss = torch.sum(focal_dice_loss_fg)
focal_dice_per_class = torch.mean(focal_dice_loss_fg)
return dice_loss, focal_dice_loss_fg, focal_dice_per_class
def focal_loss(y_pred, y_true, clweight):
# y_true = y_true.type(torch.cuda.FloatTensor)
y_pred = torch.clamp(y_pred, 1e-6, 1-1e-6)
cross_entropy = -y_true * torch.log(y_pred)
floss = torch.mean( # 0 --> [9]
torch.mean( # 2 --> [2, 9]
torch.mean( # 3 --> [2, 9, 48]
torch.mean( # 4 --> [2, 9, 48, 50]
cross_entropy*torch.pow(1-y_pred, 2), # --> [2, 9, 48, 50, 64]
4),
3),
2),
0)*clweight #.cuda()
return torch.sum(floss)
I also trained with only the focal_loss above, which doesn’t give me the nan values. when I add (focal_loss + focal_dice_loss).backward()
I get nan
error after epoch 125.
Any help troubleshooting?
I checked the data I am working with, they are normalized within 0~1.