Training Conv Seq to Seq Models

Hello, I am trying to train Gehring et al. based Convolution Sequence to Sequence model for hand written text recognition. I am using IAM database for the implementation. The architecture of the implementation is almost same as that defined by the author in his paper. For the decoder part I am using character embedding whereas the encoder is fed with output of series of Convolution layers.

The problem being that the model doesn’t seems to be learning at all. The loss function is NLLLoss with ignore_index set to token. The loss decreases for first epoch and is fixed between a range from second epoch. The accuracy is also too poor. I have tried on a small sample dataset too but even the train accuracy is very poor.

Please help me out here. Let me know of any suggestions.