There’s a lot of loss functions available in torch.nn. I am, currently, working on a speech recognizer. I am doing an end-to-end recognition. I need to have a connectionist temporal classification (CTC) layer as the outermost layer. Is there a neat way to do this?
In short, I want to have a bidrectional LSTM architecture which will have an objective to minimize CTC loss.
I don’t want to calculate the error gradients by hand obviously.