High GPU memory demand for Seq2Seq (compared to TF)