hi, I found that the gradients of nn.CTCLoss is the objective function derivatives with respect to the unnormalised outputs u(t,k), why is not the objective function derivatives with respect to the input(log probs obtained with log_softmax)? Does it compute gradients again when backpropagate through softmax layer?
Thanks for the link. This should be OK no?
All these function’s gradients are checked with finite difference. So it is very unlikely that it is not computing the backward corresponding to their forward.