Training with CTC loss for OCR produces same output

dudeperf3ct · October 16, 2020, 10:56am

I am using the nn.CTCLoss(zero_infinity=True) loss function on CRNN model. The output from training the model after a few epochs on the validation set is shown below.

The batch size used is 8 and max_length for each sequence is 16. All the different outputs can be better viewed in this colab notebook: Link to colab notebook

The first line contains input_lengths and target_lengths. The second line contains shape of log_probs and targets. From the third line onwards, first string is the originally predicted output, second string is the processed predicted output and third string is the ground truth.

tensor([16, 16, 16, 16, 16, 16, 16, 16], dtype=torch.int32) tensor([5, 4, 1, 5, 5, 4, 5, 1])
torch.Size([16, 8, 41]) torch.Size([8, 16])
eeeeeeeeeeeee222 e2 22.44___________
eeeeeeeeeeeee222 e2 8.70____________
eeeeeeeeeeeee222 e2 0_______________
eeeeeeeeeeeee222 e2 12.90___________
eeeeeeeeeeeee222 e2 15.80___________
eeeeeeeeeeeee222 e2 2.80____________
eeeeeeeeeeeee222 e2 11.50___________
eeeeeeeeeeeee222 e2 4_______________

The problem can be either the arguments passed to the CTC loss function are wrong or something else entirely.

mtt · September 2, 2024, 8:01am

Hello. Looks like it’s been a while since this was posted but i was wondering if you were able to find the issue. I am working on a plate reading task using a similar architecture and i have been having the same issue as this.

The architecture i am using is made of convolutional layers followed by bidirectional lstm layers followed by a dense layer and i am training it using the ctc loss function.

For every input it outputs 2 characters while the average length should be about 8 characters and the outputs are all made of the same characters.