After completing training with a very low CTC Loss, the OCR model was performing very poorly. The following code shows on how I calculated the loss and how I structured the input
for data in test_dataloader: loss_criterion = nn.CTCLoss() target_lengths = data['lengths'] N = data['pixel_values'].size() data.pop("lengths") pred = model(**data) pred = pred['logits'].permute(1,0,2) pred = nn.functional.log_softmax(pred, dim = 2) input_lengths = torch.full(size=(N,), fill_value=128, dtype=torch.long) custom_loss = loss_criterion(pred, data['labels'].long(), input_lengths, target_lengths) pred = pred.max(1) print("Loss",custom_loss.item()) print(input_lengths, target_lengths) print(pred[:10]) print(data['labels'][:10]) break
The output for the above code was
Loss 2.682149897736963e-05 tensor() tensor() tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0') tensor([ 2, 16869, 11603, 3, 0, 0, 0, 0, 0, 0], device='cuda:0')
You can see that the expected output and predicted output are completely different, but the loss is extremely low. The lengths of the output are padded to length 128 as you can see it. The loss calculation is done the same way as it is in the documentation . What could be the reason for this?