Why use nn.GRU AND loop over sequence in PyTorch tutorial?

rschaefer · July 3, 2020, 9:00am

Hi there,

I am referring to the following tutorial: https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

In this tutorial, a machine translation model is built using nn.GRU. If I am right, nn.GRU can be used on the whole sequence at once, whereas nn.GRUCell is just one cell. In order to use nn.GRUCell you would need to loop over the sequence.

What is confusing me, is that in the tutorial they loop over the sequence AND use the nn.GRU.

Any help to solve this confusion is appreciated.

Best,
Robin

vdw · July 3, 2020, 9:20am

I think that got asked before. I also don’t see why the encoding is done via a loop and and not giving the GRU layer the full sentence.

I assume one could in principle use a GRUCell. The only difference when it comes to feeding it step by step is that GRU supports multiple layers, but I don’t even know if this is meaningful without the full sequence.

I’m also pretty sure that the first version of the tutorial didn’t use a loop for the encoding since I also used the tutorial as a template back then.

rschaefer · July 3, 2020, 9:54am

Yes, I saw a topic wondering about the same thing, but it was not really answered, only confirmed, and as it was some time ago I thought I should ask again.

I implemented a model based on the tutorial, too, and it converged pretty well. So from that perspective, as long as it is working I may stick to that implementation. I was wondering if it’s just about processing time in the end: if you feed only one step of the sequence into nn.GRU the internal loop is redundant and may need some extra time.