Padded sequence leading to different results

Hi all,

I am coding a seq2seq model with inputs of different length. Therefore, I am using pad_packed_sequence and pack_padded_sequence in the Encoder. However, in test time I noticed some differences depending on the shuffle of the data. Testing which is the problem, I noticed slight differences in results, here is a toy example to explain which is the problem. As initialization:

seq = 4
batch = 3
in_size = 5
hidden_size = 7

input = Variable(torch.rand(seq,batch, in_size))
hidden = Variable(torch.zeros(1, batch, hidden_size))
init_hidden = Variable(torch.zeros(1, 1, hidden_size))

rnn = nn.GRU(in_size, hidden_size, 1)

batch_len = [4,3,2]
for i in range(1,batch):
        input[batch_len[i]:,i,:] = 0

input0 = input[:, 0].unsqueeze(1)
input1 = input[:batch_len[1], 1].unsqueeze(1)
input2 = input[:batch_len[2], 2].unsqueeze(1)
input_pad = pack_padded_sequence(input, batch_len)

I assume that the output and hidden state should be the same using input_pad or the concatenation of the three input*

output_pad, hidden = rnn(input_pad, hidden)
output0, hidden0 = rnn(input0, init_hidden)
output1, hidden1 = rnn(input1, init_hidden)
output2, hidden2 = rnn(input2, init_hidden)

output, out_len = pad_packed_sequence(output_pad)

However, there are small changes:

>>> print((torch.cat([hidden0,hidden1, hidden2],1) - hidden).abs().sum())
Variable containing:
1.00000e-07 *
  1.6391
[torch.FloatTensor of size 1]

>>> (torch.cat([output0, torch.cat([output1,Variable(torch.zeros(output.size(0)-output1.size(0),output1.size(1),output1.size(2)))],0), torch.cat([output2,Variable(torch.zeros(output.size(0)-output2.size(0),output2.size(1),output2.size(2)))],0)],1) - output).abs().sum()
Variable containing:
1.00000e-07 *
  3.4925
[torch.FloatTensor of size 1]

I am using the padded and packed functions correctly? Is there a bug in this functions? I’ve got the impression that this can lead me to different models in real scenarios.

Thank you very much and sorry for the long post.