I tried the seq2seq pytorch implementation available here pytorch-seq2seq. After profiling the evaluation(evaluate.py) code, the piece of code taking longer time was the decode_minibatch method( github.com/MaximumEntropy/Seq2Seq-PyTorch/blob/master/evaluate.py#L74)
Trained the model on GPU and loaded the model in CPU mode to make inference. But unfortunately, every sentence seems to take ~10sec. Is slow prediction expected on pytorch?
Any fixes, suggestions to speed up would be much appreciated. Thanks.