I am training a character recognizer based on the Libri-speech dataset.
The input is the Mel-spectrogram shape --> tensor of size [10,1,512, 2000]
The labels shape --> tensor of size [10, 300]
The output shape --> tensor of size [1000, 10, 29]
input_lengths = [1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000]
target_lengths = [300, 300, 300, 300, 300, 300, 300, 300, 300, 300]
Following a snapshot of the code
criterion_back = torch.nn.CTCLoss(blank=28)
output back_model(enhanced)
output = F.log_softmax(output, dim=2)
output = output.transpose(0, 1)
back_loss = criterion_back(output, label, input_lengths, target_lengths)
The problem is that at the beginning of the second epoch. I got this error:
Hint: The first epoch was done smoothly without any error this error happens when it starts the second epoch.
File "train.py", line 95, in <module>
back_loss = criterion_back(output, label, input_length, target_length)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1714, in forward
return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction,
File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 2559, in ctc_loss
return torch.ctc_loss(
RuntimeError: input_lengths must be of size batch_size