My vague understanding from the source and discussions I’ve read is that it wraps some external cpp modules (linking to cudnn), and implements its own backwards() rather than relying on pytorch’s autograd.
I therefore assume higher order gradients (e.g. HVPs) through CTCLoss won’t work.
Indeed. I’d recommend starting from the original paper, you could differentiate equations 15 and the derivative below it to compute the second derivative. I’ve tried to comment the source to make it easy to follow with the paper in mind (but remember that alpha and beta are computed in log space).
I’m not sure whether it will work well numerically - even for the first derivative of ctc_loss, the numerical precision can be touchy.
I’d be interested to see your results.
Hi @tom ! I’m trying to do double backward for ctc loss. Is there a way to approximate ctc gradient with finite differences? And is there a way to calculate ctc backward in double type so that there are no precision problems
I’m not aware of a double backward for ctc loss being available
By their very nature, finite differences compute approximate directional derivatives (so jacobian-vector products if you want, what you would get with forward mode) rather than gradients (aka vector-jacobian products). As such generally compute-intensive to approximate gradients with finite differences because you would need as many evaluations as you have inputs.
You can use CTC loss with doubles just by passing them, enabling this was one of my goals with having an integrated open source implementation.