When I was training my Neural Network, I found that the weights of my model became Nan after some steps. Then, according to solutions in other topics, I check the loss and gradients of the former step using
torch.isfinite(param.grad).all(). I found out that the gradient of all layers in my model became inf, while the loss is still small (at least much smaller than beginning).
This is how I define my loss function:
class GLoss(nn.Module): def __init__(self,len): super().__init__() self.L1_loss= nn.SmoothL1Loss() self.len_pred = len self.double() def forward(self,predict,label):#input of iou3d should be [x,y,z,w,h,l,theta] loss = torch.tensor(0.0) len_gt = len(label) for i in range(self.len_pred): #block start box=predict[i] keep = label best_score = iou3D(box[:-1].detach(),keep.detach()) for j in range(1,len_gt): score = iou3D(box[:-1].detach(),label[j].detach()) if score > best_score: best_score=score keep=label[j] #block end #The above block is used to choose a proper training target, I don't think that it will affact loss computation. loss+=self.loss_calculate(box,keep,best_score) return loss/128 def loss_calculate(self,pred,gt,score): loss=torch.tensor(0.0) position = pred[:3] tar_pos = gt[:3] L1L=self.L1_loss(position,tar_pos) loss += L1L Bbox= pred[3:6] Bbox_t= gt[3:6] BBL=torch.sum(torch.abs(1-Bbox/Bbox_t)) loss += BBL angle=pred angle_t=gt AGL=torch.sqrt(2*(1-torch.cos(angle-angle_t))) loss += AGL if score>0.5: cls=1 else: cls=0 x=torch.sigmoid(pred[-1]) CLL=-(cls*torch.log(x)+(1-cls)*torch.log(1-x)) loss += CLL return loss
My model is a little large, so I only paste the top block of my model.
transformer.blocks.5.ln1.weight tensor(False) transformer.blocks.5.ln1.bias tensor(False) transformer.blocks.5.ln2.weight tensor(False) transformer.blocks.5.ln2.bias tensor(False) transformer.blocks.5.attn.qkv.weight tensor(False) transformer.blocks.5.attn.qkv.bias tensor(False) transformer.blocks.5.attn.proj.weight tensor(False) transformer.blocks.5.attn.proj.bias tensor(False) transformer.blocks.5.mlp.0.weight tensor(False) transformer.blocks.5.mlp.0.bias tensor(False) transformer.blocks.5.mlp.2.weight tensor(False) transformer.blocks.5.mlp.2.bias tensor(False) transformer.norm.weight tensor(False) transformer.norm.bias tensor(False) transformer.head.fc.weight tensor(False) transformer.head.fc.bias tensor(False)
This model starts from the last fc layer, while the gradient is inf in this layer.
And the loss value is:
From my former experience, I suppose that gradient exploding problem should happens in some middle layers of the model, but not in the input. And the loss should be very large when gradient exploding happens. It is not the case in my problem. Therefore, I wonder if there is anything wrong with my loss function?