Hi,
thank you.
So if you look at the loss with reduction=‘none’, you see that the element number 4 of the batch has infinite loss. This is because the input length is smaller than the target length, i.e. you cannot possibly get the an alignment between input and target (the actual condition is a bit stricter and more elaborate when the target has repeated labels because the network then needs to emit a blank in between and needs a longer input, so you get a necessary condition input length >= target length + repetitions for the loss to be finite - and when you have softmax it also is sufficient barring numerical overflow to infinity).
I’m not entirely sure what warpctc does, but from my recollection it may just report 0 instead or so. Note that the gradient of this will be NaN for the inputs in question, maybe it would be good to optionally clip that to zero (which you could do with a backward hook on the inputs now).
Best regards
Thomas
(Edited “<” vs. “>” based on @jinserk’s correction below. Thanks!)