Different gradient updates when using PackedSequence

ss42 · September 18, 2020, 3:49pm

Given a batch of sequences:
[[1,2,3,4],
[1,2]]

I am outputing my hidden layers at each step for a custom lstm in two ways:

[ 2 x hidden tensor,
2 x hidden tensor,
1 x hidden tensor,
1 x hidden tensor]

6 x hidden packed sequence

I set the seed and all calculations are the same in the first batch, but in the second batch after calling .backward() on the first batch loss the hidden layers have slightly different values between methods .

Have packed sequences caused issues for anybody when they update the gradients? Anything I should know when using them?