# Why all gradients go inf when loss function is still small?

When I was training my Neural Network, I found that the weights of my model became Nan after some steps. Then, according to solutions in other topics, I check the loss and gradients of the former step using `torch.isfinite(param.grad).all()`. I found out that the gradient of all layers in my model became inf, while the loss is still small (at least much smaller than beginning).

This is how I define my loss function:

``````class GLoss(nn.Module):
def __init__(self,len):
super().__init__()
self.L1_loss= nn.SmoothL1Loss()
self.len_pred = len
self.double()
def forward(self,predict,label):#input of iou3d should be [x,y,z,w,h,l,theta]
loss = torch.tensor(0.0)
len_gt = len(label)
for i in range(self.len_pred):

#block start
box=predict[i]
keep = label[0][0]
best_score = iou3D(box[:-1].detach(),keep.detach())
for j in range(1,len_gt):
score = iou3D(box[:-1].detach(),label[0][j].detach())
if score > best_score:
best_score=score
keep=label[0][j]
#block end
#The above block is used to choose a proper training target, I don't think that it will affact loss computation.
loss+=self.loss_calculate(box,keep,best_score)
return loss/128
def loss_calculate(self,pred,gt,score):
loss=torch.tensor(0.0)
position = pred[:3]
tar_pos = gt[:3]
L1L=self.L1_loss(position,tar_pos)
loss += L1L
Bbox= pred[3:6]
Bbox_t= gt[3:6]
BBL=torch.sum(torch.abs(1-Bbox/Bbox_t))
loss += BBL
angle=pred[6]
angle_t=gt[6]
AGL=torch.sqrt(2*(1-torch.cos(angle-angle_t)))
loss += AGL
if score>0.5:
cls=1
else:
cls=0
x=torch.sigmoid(pred[-1])
CLL=-(cls*torch.log(x)+(1-cls)*torch.log(1-x))
loss += CLL
return loss
``````

My model is a little large, so I only paste the top block of my model.

``````transformer.blocks.5.ln1.weight tensor(False)
transformer.blocks.5.ln1.bias tensor(False)
transformer.blocks.5.ln2.weight tensor(False)
transformer.blocks.5.ln2.bias tensor(False)
transformer.blocks.5.attn.qkv.weight tensor(False)
transformer.blocks.5.attn.qkv.bias tensor(False)
transformer.blocks.5.attn.proj.weight tensor(False)
transformer.blocks.5.attn.proj.bias tensor(False)
transformer.blocks.5.mlp.0.weight tensor(False)
transformer.blocks.5.mlp.0.bias tensor(False)
transformer.blocks.5.mlp.2.weight tensor(False)
transformer.blocks.5.mlp.2.bias tensor(False)
transformer.norm.weight tensor(False)
transformer.norm.bias tensor(False)
`tensor(4.0644, grad_fn=<DivBackward0>)`