I suppose I should define a
GRUCell) in my decoder and use a for loop in the forward method in decoder. Something like this:
But when I look at the tutorial of seq-seq translation. I got confused because the tutorial use nn.GRU as oppose to GRUCell, and a for loop was used in the training loop… Is is related to attention?
If I want to implement a many-to-many RNN with attention and control the output to be 5 steps, should I use