CTC Loss Problems

Hello, I unfortunately have to deal with the problematic CTC Loss.

I have a Bidirectional RNN custom module followed by 3 Fully connected layers and I am trying to implement a Speech Recognizer (Based on Deep Speech 2).

criterion = torch.nn.CTCLoss(reduction="sum", zero_infinity=True)

My Batch Size is 16.
The input sizes are fixed(N_features) but sequence lengths are different between each mini-batch.
I apply BatchNorm1D to my input tensors and not in between layers.
I use torch.int32 for tensors related to targets, and torch.float32 for my input tensors as required by cuDNN.
I also enable cuDNN and tried benchmark mode on and off.

My first problem is when my model is fairly simple, training loss gets stuck around a certain value at couple of epochs. The model predicts all blanks or random characters from the alphabet.

Second problem is that when I increase model complexity, the loss becomes NaN after a few iterations. This should have been fixed probably by the zero_infinity parameter, but it turns out that it doesn’t.

I know that these are common problems, but I couldn’t find any solutions to it.

UPDATE:

I started using a BatchNorm1D before every non-linearity in my network. Now the network is making not all blank predictions, though they do not make sense in terms of the language. The loss however grew 10 times compared with a singe BatchNorm for the input layer. I am more optimistic as maybe with enough epochs this model can converge.

UPDATE:

I run the comet.ml’s adaptation of my model using native Pytorch CTCLoss and it gave reasonable results in no time. I am not able to understand why my approach did not work. If I find out in the future, I will share it here, otherwise; I think the CTCLoss works fine.