RNN with and without loop output mismatch

csblacknet · December 24, 2020, 5:33pm

For some tasks I have seen two different ways of using RNNS:

In the First Method, Input [batch_size, input_size, sequence_size] is directly passed and the outputs, hidden are returned by it.
In Second Method, Each timestamp [input_size, sequence_size] of the Input is passed and the outputs, hidden are received on the other end, this process happens in a loop(for i in range(batch_size)).

My doubt is what’s really the difference between these two methods?

To support this, I have written a small piece of code, which might help you understand where I went wrong.

This is my basic RNN Class,

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes, device):
        super(RNN, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.device = device

        #nn.RNN and nn.GRU work same way
        self.rnn = nn.GRU(input_size, hidden_size, num_layers, batch_first = True)
        # input : batch_size, sequence_len, input_size
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self,x,  h0 = None):
        if h0 is None:
          h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(self.device)
        out, ht = self.rnn(x, h0)

        return out, ht

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
rnn = RNN(28, 128, 1 , 10, device)

Below I try to get outputs and hidden directly without looping through each timestamp

#Without Loop
inps = torch.randn((32,100,28))
out_wl,ht_wl = rnn(inps)
print("Output shape : ", out_wl.shape) #Output shape :  torch.Size([32, 100, 128])
print("Hidden shape : ", ht_wl.shape) #Hidden shape :  torch.Size([1, 32, 128])

Here I am trying to send one timestamp(row) at a time and record all of the outputs.

#With Loop
outs = []
hns = []
hn_temp = torch.zeros(1, 1, 128).to(device)
for inp in inps:
  inp = inp.unsqueeze(0)
  out_temp,hn_temp = rnn(inp, hn_temp)
  outs.append(out_temp)
  hns.append(hn_temp)

outs = torch.stack(outs)
hns = torch.stack(hns)

print(outs.shape) #torch.Size([32, 1, 100, 128])
print(hns.shape) #torch.Size([32, 1, 1, 128])

outs = outs.squeeze(1)
hns = hns.squeeze(1)

print(outs.shape) #torch.Size([32, 100, 128])
print(hns.shape) #torch.Size([32, 1, 128])

The issue here is when I do

out_wl == outs

The output tensor is not True everywhere.

This all boils down to few questions,

Both methods I talked about, are they same or not?
If they are same then the resulting output doesn’t match, do mind model has not done any backward pass and the same randn input is used in both with and without loop snippets.
Is last value of outs same as hns, does that mean outs is just a collection of hns for all s