PyTorch Translation with a Sequence to Sequence Network and Attention


I was going through the (amazing!) tutorial here and I was a little nonplussed about an implementation detail in the EncoderRNN class. After retrieving the input from the embedding look-up table, the variable output is bound to it, so operations that mutate output will mutate the results in the embedding table as well. That is to say, the embedding table parameters are updated using the gradients of the GRU outputs.

Can anyone explain the mechanics there in a bit more detail/provide comments on my statements above?

Insights welcome and cherished :slight_smile: