Zero CTC Loss Unchanged by Parameter Tweaks

pidgeotto · November 12, 2023, 12:54am

I’ve hit a wall with an issue that’s as intriguing as it is frustrating, and I’m hoping to tap into the collective wisdom of this community to find a resolution.

I’m working on a CRNN model for an OCR task, and no matter what parameters I adjust, my CTC loss remains stubbornly at zero. This has turned into a real head-scratcher, and I’m looking for fresh perspectives on what might be going awry.

My model’s architecture seems solid at a glance, but during training, the loss doesn’t reflect any learning or improvement.
I’ve tried adjusting learning rates, experimenting with different optimizers, and even tweaking the loss function’s parameters – all to no avail.

Here’s a glimpse of my setup:

Model: Custom CRNN designed for OCR.
Data: Image tensors with corresponding text labels, verified to be correctly formatted.
Loss Function: nn.CTCLoss, with blank=0 and zero_infinity=True to avoid NaN issues.
Optimizer: optim.Adam with a learning rate that I’ve varied from 0.001 down to 0.0001.

What I’ve attempted so far:

Ensuring the data loader is correctly shuffling and batching images along with their respective labels.
Double-checking the image preprocessing steps and label encoding.
Monitoring gradients and weights for any signs of life.

Has anyone encountered a similar issue? Could there be an overlooked aspect of the CRNN architecture or the CTC loss function that’s causing this? Any insights, suggestions, or debugging tips would be immensely appreciated.

I’ve attached snippets of my code below for reference and am open to any questions that might help clarify the issue.

Thank you for your time, and I look forward to any advice you can share!


import torch.nn as nn

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

images = torch.randn(4,1,300,192).to(device)

output = crnn_model(images)

labels = torch.randn(4,17932)

# print(output)

input_lengths = torch.full(size=(images.size(0),), fill_value=1, dtype=torch.long).to(device)

target_lengths = torch.tensor([17932,17932,17932,17932], dtype=torch.long).to(device)

criterion = nn.CTCLoss(blank=0,zero_infinity=True).to(device)

loss = criterion(output, labels, input_lengths, target_lengths)

print(loss)

Even this dummy data gives 0 loss.

For reference, 17932 is the buffered size of the bounding boxes of the labels in my data set.

I have transcription data (The contents of what is labeled) that I noticed isn’t getting used, and have tried using it instead, which also results in 0 loss.