CNN-RNN-CTCloss for OCR generates Nan in training

Hi.
I am using a CNN-RNN model for OCR task and then I’m using a CTC loss to train the model. the training is done in GPU and unfortunately I have zero in my target sizes (I can’t remove them).
I have read some other posts about this problem and I know there is a problem in GPU version of CTC loss with zero target size but I can’t train my model on CPU (I have about 10M data samples).
Is there anything I can do to solve this problem?

and by the way, I have traced model outputs and grads. I have seen that in the training procedure, gradients become Nan but the loss is not Nan at same step. and after updating, weights become Nan and the output and then loss becomes Nan.
and also the first time, the gradients of weights and biases of batch normalization layer became Nan. and also the whole grad tensor was Nan.