Yes, I saw a topic wondering about the same thing, but it was not really answered, only confirmed, and as it was some time ago I thought I should ask again.
I implemented a model based on the tutorial, too, and it converged pretty well. So from that perspective, as long as it is working I may stick to that implementation. I was wondering if it’s just about processing time in the end: if you feed only one step of the sequence into nn.GRU the internal loop is redundant and may need some extra time.