Getting error in backward function

I am trying to implement LipNet but after few iterations I am getting nan as my loss, I checked my input and output from the model but there isnt any nan value but after loss.backward() I am getting nan values in my gradient and assert statement is getting failed. Below is the code snippet, can someone help me out with this??

y is output from model with shape (Batch_size, time_steps, number of classes)

loss = nn.CTCLoss(y.transpose(0, 1).log_softmax(-1), txt, vid_len.view(-1), txt_len.view(-1))
loss.backward()
nn.utils.clip_grad_norm_(net.parameters(), 1.0)

assert not any(torch.isnan(p.grad).any() for p in net.parameters()), β€œNaN values found in the gradients after the backward pass.”

Infs and NaNs in CTCLoss were discussed a few times already in other posts so take a look at e.g. this one and check if your use case might run into the same issue.

1 Like