CNN ASR getting nan and core dump after epoch 1 with custom dataset

I’m trying to implement the code from here using a custom data set. I’m able to get the code to run with the librispeech dataset but when I use my dataset I get the following:

Train Epoch: 1 [0/2875 (0%)] Loss: 10.740855

Then the next value for the loss would be NAN

Any help is appreciated!

I added gradient clipping here:

loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 5)
optimizer.step()

Once I do that I get the following:

Train Epoch: 1 [0/2875 (0%)] Loss: 10.740855
Segmentation fault (core dumped)

My dataset has clips from 3 to 14 seconds.

What loss function are you using? can you post the code block of how you pass arguments to it as well?

Thanks for the reply! I’m using the CTC loss function:

criterion = nn.CTCLoss(blank=0).to(device)

Below is the block of code I use to train the model which is the arguments I pass to the loss function:

def train(model, device, train_loader, criterion, optimizer, scheduler, epoch, iter_meter, experiment):
    model.train()
    data_len = len(train_loader.dataset)
    with experiment.train():
        for batch_idx, _data in enumerate(train_loader):
            spectrograms, labels, input_lengths, label_lengths = _data
            spectrograms, labels = spectrograms.to(device), labels.to(device)
            optimizer.zero_grad()
            output = model(spectrograms)  # (batch, time, n_class)
            output = F.log_softmax(output, dim=2)
            output = output.transpose(0, 1) # (time, batch, n_class)
            loss = criterion(output, labels, input_lengths, label_lengths)
            loss.backward()

            torch.nn.utils.clip_grad_norm_(model.parameters(), 5)

            experiment.log_metric('loss', loss.item(), step=iter_meter.get())
            experiment.log_metric('learning_rate', scheduler.get_lr(), step=iter_meter.get())

            optimizer.step()
            scheduler.step()
            iter_meter.step()
            if batch_idx % 100 == 0 or batch_idx == data_len:
                print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch, batch_idx * len(spectrograms), data_len,
                    100. * batch_idx / len(train_loader), loss.item()))