Hello,
I’m trying to port over a CTC network from Keras. I’ve based the model off of https://github.com/meijieru/crnn.pytorch/blob/master/models/crnn.py (basically replacing the warp_ctc.CTCLoss with the pytorch CTCLoss because warp_ctc won’t compile). This is using pytorch version 1.0.1, CUDA version 9.0.
Training code is:
net = CRNN(32, 3, len(labels), nh=256)
net.to(device)
ctc_loss = nn.CTCLoss(blank=blank_ind)
ctc_loss.to(device)
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.09)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, data_labels = data
for k,v in inputs.items():
v = v[0]
inputs[k] = v.to(device)
for k,v in data_labels.items():
v = v[0]
data_labels[k] = v.to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
predicteds = net(inputs)
log_probs = predicteds.log_softmax(2)
input_lengths = torch.full((inputs['image'].shape[0],),
log_probs.shape[0], dtype=torch.int)
loss = ctc_loss(log_probs, data_labels['the_labels'], input_lengths, data_labels['label_length'])
loss.backward()
optimizer.step()
Dimensions of variables (100 is the batch size):
- Input images =
(3, 32, 248)
-
log_probs
=(63, 100, 250)
-
data_labels['the_labels']
=(100, 12)
-
input_lengths
=(100,)
-
data_labels['label_length']
=(100,)
Sample format of the labels (where 249 is the blank index passed to CTCLoss):
labels tensor([[ 2, 98, 101, 83, 0, 249, 249, 249, 249, 249, 249, 249],
[ 2, 124, 13, 41, 0, 249, 249, 249, 249, 249, 249, 249],
[ 2, 24, 113, 13, 109, 0, 249, 249, 249, 249, 249, 249],
[ 2, 112, 114, 124, 2, 249, 249, 249, 249, 249, 249, 249],
[ 2, 28, 30, 76, 0, 249, 249, 249, 249, 249, 249, 249],
[ 0, 41, 14, 98, 2, 249, 249, 249, 249, 249, 249, 249],
[ 2, 41, 125, 13, 41, 0, 249, 249, 249, 249, 249, 249],
[ 0, 76, 13, 124, 2, 249, 249, 249, 249, 249, 249, 249],
[ 2, 24, 125, 13, 83, 0, 249, 249, 249, 249, 249, 249],
[ 0, 41, 43, 35, 2, 249, 249, 249, 249, 249, 249, 249],
[ 2, 112, 114, 35, 2, 249, 249, 249, 249, 249, 249, 249]],
device='cuda:0')
labels_length tensor([5, 5, 6, 5, 5, 5, 6, 5, 6, 5, 5, 6, 5, 6, 6, 5, 5, 5, 5, 5, 6, 6, 6, 5,
5, 5, 5, 5, 5, 5, 5, 6, 6, 5, 6, 6, 5, 5, 5, 5, 6, 6, 5, 5, 5, 5, 3, 5,
6], device='cuda:0')
What I’ve tried:
- Setting the blank index to different values (either the length of the labels or 0)
- Using a different optimizer/smaller learning rates (suggested in CTCLoss predicts all blank characters, though it’s using warp_ctc)
- Training on just input images that have a sequence (rather than images with nothing in them)
In all cases the network will produce random labels for the first couple of batches before only predicting blank labels for all subsequent batches.
Is there anything that I’m missing here?