Different output of one-by-one or entire sequence in RNN

Hi! I’ve been experimenting to speed up my RNN training and from the official LSTM tutorial, I’ve realized to submit the entire sequence at once rather than submitting the elements one by one. But when I checked out the outputs of these two approaches for a given input, setting all the seeds and making sure that LSTMs starts with the same weights, they come out different which makes me suspicious? Is there something I’m missing (any extra seed I should set) or am I assured that it is giving me the same results whatsoever?

Here is the codes and their output:

First feeding the elements of the sequence one-by-one:

np.random.seed(111)
torch.manual_seed(1)
lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(2)]  # make a sequence of length 5
print("inputs: ", inputs)
# initialize the hidden state.
hidden = (torch.randn(1, 1, 3),
          torch.randn(1, 1, 3))
for i in inputs:
    # Step through the sequence one element at a time.
    # after each step, hidden contains the hidden state.
    out, hidden = lstm(i.view(1, 1, -1), hidden)
    print("Out: ",out)
    print("Hidden: ",hidden)
    
print("PARAMETERS")
for p in lstm.parameters():
    if p.requires_grad:
         print(p.name, p.data)

Outputs:

inputs:  [tensor([[-0.5525,  0.6355, -0.3968]]), tensor([[-0.6571, -1.6428,  0.9803]])]
Out:  tensor([[[-0.2705,  0.0552, -0.1255]]])
Hidden:  (tensor([[[-0.2705,  0.0552, -0.1255]]]), tensor([[[-1.2459,  0.1427, -0.3710]]]))
Out:  tensor([[[-0.5532,  0.0449, -0.1183]]])
Hidden:  (tensor([[[-0.5532,  0.0449, -0.1183]]]), tensor([[[-1.2481,  0.1589, -0.1761]]]))
PARAMETERS
None tensor([[ 0.2975, -0.2548, -0.1119],
        [ 0.2710, -0.5435,  0.3462],
        [-0.1188,  0.2937,  0.0803],
        [-0.0707,  0.1601,  0.0285],
        [ 0.2109, -0.2250, -0.0421],
        [-0.0520,  0.0837, -0.0023],
        [ 0.5047,  0.1797, -0.2150],
        [-0.3487, -0.0968, -0.2490],
        [-0.1850,  0.0276,  0.3442],
        [ 0.3138, -0.5644,  0.3579],
        [ 0.1613,  0.5476,  0.3811],
        [-0.5260, -0.5489, -0.2785]])
None tensor([[ 0.5070, -0.0962,  0.2471],
        [-0.2683,  0.5665, -0.2443],
        [ 0.4330,  0.0068, -0.3042],
        [ 0.2968, -0.3065,  0.1698],
        [-0.1667, -0.0633, -0.5551],
        [-0.2753,  0.3133, -0.1403],
        [ 0.5751,  0.4628, -0.0270],
        [-0.3854,  0.3516,  0.1792],
        [-0.3732,  0.3750,  0.3505],
        [ 0.5120, -0.3236, -0.0950],
        [-0.0112,  0.0843, -0.4382],
        [-0.4097,  0.3141, -0.1354]])
None tensor([ 0.2820,  0.0329,  0.1896,  0.1270,  0.2099,  0.2862, -0.5347,
         0.2906, -0.4059, -0.4356,  0.0351, -0.0984])
None tensor([ 0.3391, -0.3344, -0.5133,  0.4202, -0.0856,  0.3247,  0.1856,
        -0.4329,  0.1160,  0.1387, -0.3866, -0.2739])

and feeding the sequence all at once:

# alternatively, we can do the entire sequence all at once.
# the first value returned by LSTM is all of the hidden states throughout
# the sequence. the second is just the most recent hidden state
# (compare the last slice of "out" with "hidden" below, they are the same)
# The reason for this is that:
# "out" will give you access to all hidden states in the sequence
# "hidden" will allow you to continue the sequence and backpropagate,
# by passing it as an argument  to the lstm at a later time
# Add the extra 2nd dimension
np.random.seed(111)
torch.manual_seed(1)
lstm = nn.LSTM(3, 3)  # Input dim is 3, output dim is 3
inputs = [torch.randn(1, 3) for _ in range(2)]  # make a sequence of length 5
print("inputs: ", inputs)
# initialize the hidden state.
hidden = (torch.randn(1, 1, 3),
          torch.randn(1, 1, 3))

inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3))  # clean out hidden state
out, hidden = lstm(inputs, hidden)
print("Out: ", out)
print("Hidden: ",hidden)

print("PARAMETERS")
for p in lstm.parameters():
    if p.requires_grad:
         print(p.name, p.data)

Outputs:

inputs:  [tensor([[-0.5525,  0.6355, -0.3968]]), tensor([[-0.6571, -1.6428,  0.9803]])]
Out:  tensor([[[ 0.0004, -0.0286,  0.3850]],

        [[-0.3554, -0.0004,  0.3394]]])
Hidden:  (tensor([[[-0.3554, -0.0004,  0.3394]]]), tensor([[[-0.5815, -0.0017,  0.5999]]]))
PARAMETERS
None tensor([[ 0.2975, -0.2548, -0.1119],
        [ 0.2710, -0.5435,  0.3462],
        [-0.1188,  0.2937,  0.0803],
        [-0.0707,  0.1601,  0.0285],
        [ 0.2109, -0.2250, -0.0421],
        [-0.0520,  0.0837, -0.0023],
        [ 0.5047,  0.1797, -0.2150],
        [-0.3487, -0.0968, -0.2490],
        [-0.1850,  0.0276,  0.3442],
        [ 0.3138, -0.5644,  0.3579],
        [ 0.1613,  0.5476,  0.3811],
        [-0.5260, -0.5489, -0.2785]])
None tensor([[ 0.5070, -0.0962,  0.2471],
        [-0.2683,  0.5665, -0.2443],
        [ 0.4330,  0.0068, -0.3042],
        [ 0.2968, -0.3065,  0.1698],
        [-0.1667, -0.0633, -0.5551],
        [-0.2753,  0.3133, -0.1403],
        [ 0.5751,  0.4628, -0.0270],
        [-0.3854,  0.3516,  0.1792],
        [-0.3732,  0.3750,  0.3505],
        [ 0.5120, -0.3236, -0.0950],
        [-0.0112,  0.0843, -0.4382],
        [-0.4097,  0.3141, -0.1354]])
None tensor([ 0.2820,  0.0329,  0.1896,  0.1270,  0.2099,  0.2862, -0.5347,
         0.2906, -0.4059, -0.4356,  0.0351, -0.0984])
None tensor([ 0.3391, -0.3344, -0.5133,  0.4202, -0.0856,  0.3247,  0.1856,
        -0.4329,  0.1160,  0.1387, -0.3866, -0.2739])

As can be see above, the outputs and the hidden states doesnot match eventough the input and the initial parameters of the LSTMS are the same? What am I supposed to understand out of it?