I have a total_ loss which is sum of -
- A BCELoss
- A Crossentropy loss
- A custom loss function for image gradient.
The problem I am facing is that after 1st batch, some weights are updated to nan which results in all outputs as nan. If I remove the gradient loss, then it works fine.
What I found out was the denominator in the gradient loss were becoming 0, which was causing the problem. To fix it, I replace all denominators( Gradient magnitude) which were 0 with 1. But doing this only computes numerical loss for first backprop, next results in nan.
sobel_x = torch.tensor([[+1, 0, -1], [+2, 0, -2], [+1, 0, -1]], requires_grad=False,dtype = torch.float) sobel_y = torch.tensor([[+1, +2, +1], [0, 0, 0], [-1, -2, -1]], requires_grad=False,dtype = torch.float) if cuda: sobel_x,sobel_y = sobel_x.cuda(),sobel_y.cuda() boundary_mask = boundary_mask.cuda() sobel_x = sobel_x.view((1,1,3,3)) sobel_y = sobel_y.view((1,1,3,3)) #gradients in the x and y direction for both predictions and the target transparencies G_x_pred = F.conv2d(pred,sobel_x,padding = 1) G_y_pred = F.conv2d(pred,sobel_y,padding = 1) G_x_target = F.conv2d(target,sobel_x,padding = 1) G_y_target = F.conv2d(target,sobel_y,padding = 1) #magnitudes of the gradients M_pred = torch.sqrt(torch.pow(G_x_pred,2)+torch.pow(G_y_pred,2)) M_target = torch.sqrt(torch.pow(G_x_target,2)+torch.pow(G_y_target,2)) #taking care of nans M_pred = (M_pred==0.).float() + M_pred M_target = (M_target==0.).float() + M_target # Lcos = (1-v_pred*v_target)*Magnitude_pred Lcos = (1-torch.abs((G_x_pred/M_pred)*(G_x_target/M_target)+(G_y_pred/M_pred)*(G_y_target/M_target)))*M_pred #Lmag = max(lambda*M_target-M_pred,0) lambd = 1.5 Lmag = lambd*M_target-M_pred Lmag[Lmag<0] = 0 gamma_1 = 0.5 gamma_2 = 0.5 #total gradient loss image_gradient_loss = (gamma_1*Lcos+gamma_2*Lmag)*boundary_mask
Can someone please help in tackling this situation.