NaN loss in training loop

Hi!
I’ve designe a network for a regressive task using LSTM.However, my loss becomes NaN when I try to train it and I don’t understand why.

This is my training loop:

def training_loop(dataloader,model,loss_fn,report_loss_fn,optimizer, batch_size, epoch,report_freq=10, writer=writer):
    size = len(dataloader.dataset)

    model.train()
    for batch_n, (X,y,x_mask,y_mask) in enumerate(train_dataloader):
        pred = model(X)

        #validate predictions by selecting non-padded values
        valid_pred = torch.masked_select(pred,y_mask)
        valid_targets = torch.masked_select(y,y_mask)

        #calculate optim loss
        loss = loss_fn(valid_pred,valid_targets)
        loss.backward()
        
        #adjust params
        optimizer.step()
        optimizer.zero_grad()
        
        #get loss for each pass
        running_loss = loss.item()

        if batch_n % report_freq == 0:
            current_values = batch_n*batch_size + len(X)
            writer.add_scalar('Train optim loss',running_loss,epoch+1)
            print(f"Train loss: {running_loss:>7f} [{current_values:>5d}/{size:>5d}]")

The print statements print this, where it all nan after:

Train loss: 1.087582 [    4/ 1634]
Train loss: 0.840909 [  104/ 1634]
Train loss:     nan [  204/ 1634]
Train loss:     nan [  304/ 1634]
Train loss:     nan [  404/ 1634]

I’m using this optimizer:
optim = torch.optim.Adam(params=lstm_model.parameters(recurse=True),weight_decay=1e-05,lr=0.0001)

I’ve tried changing the learning rate, when I increased it to 0.5, the first 3-4 were not nan, so i think it could be something related to that.