IndexError: index 1 is out of bounds for dimension 0 with size 1 when trying to initialize LSTM hidden state

Hello,

I am new to pytorch and am trying to implement a custom neural network that includes and LSTM for a robot navigation task. I need to initialize the LSTM’s hidden state with the output of an upstream model. When I try to set the initial hidden state I get the error IndexError: index 1 is out of bounds for dimension 0 with size 1 and cant find how to deal with it. My model code and test code is below. Note that in this test I am using an initial hidden state of tensor.zeros(1,1,batch_size,128) for the sake of making sure that the input is correct and it is not the upstream network is causing the issue with a incorrectly dimensioned tensor. Also I am not sure why the input needs to have the extra dimension (two ones in front instead of just one) or if that is part of the cause for my problems. When I omit the additional dimension, I get the error RuntimeError: Expected hidden[0] size (1, 2, 128), got (2, 128) when batch size = 2. Any help is greatly appreciated.

My Model:

class NavNet(nn.Module):
    def __init__(self, input_shape=(1,3,180,320)):
        super(NavNet, self).__init__()
        
        self.input_conv = nn.Sequential(
            nn.Conv2d(3,32, kernel_size=5,stride=2, padding=0),
            nn.ReLU(),
            nn.Conv2d(32,32,5,2),
            nn.ReLU(),
            nn.Conv2d(32,64,3,2),
            nn.ReLU(),
            nn.Flatten()
        )
        
        fc_dims = self.input_conv(torch.zeros(input_shape)).shape[1]
        
        self.input_fc = nn.Sequential(
            nn.Linear(fc_dims,256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 128),
            nn.ReLU(),
            nn.Linear(128, 128)
        )
                
        self.action_fc = nn.Sequential(
            nn.Linear(2, 16),
            nn.ReLU(),
            nn.Linear(16, 16)
        )
        
        self.output_fc = nn.Sequential(
            nn.Linear(128, 32),
            nn.ReLU(),
            nn.Linear(32, 4) # collision, bumpiness, x, y
        )
        
        self.lstm = nn.LSTM(16, 128, batch_first=True) # 16 input size, 128 hidden size

    # seq is the action sequence to be evaluated [[action 1],[action 2],[action 3]]
    def forward(self,x,seq):
        
        x = self.input_conv(x)
        print("Conv output : ", x.shape)
        x = self.input_fc(x).unsqueeze(0)
        print("FC output : ", x.shape)
        
        actions = self.action_fc(seq)
        print("Action FC Output : ", actions.shape)
        
        # actions : (batch, seq_len, input_size) <- if batch_first=True
        # hiden state / input : (num_layers * num_directions, batch, hidden_size) -> num_layers == num_directions == 1
        x, hidden = self.lstm(actions, torch.zeros(1, 1, 2, 128)) # 128 image data goes in as first hidden state
        
        output = self.output_fc(x)
        
        return output

Test Code:

navnet = NavNet()
test_batch_size = 2

test_input = torch.zeros((test_batch_size, 3,180,320))
print("Input Image : ", test_input.shape)

test_actions = torch.zeros((test_batch_size, 6,2))
print("Input Actions : ", test_actions.shape)

output = navnet(test_input, test_actions)
print("Output : ", output.shape)
print(output)

Output:

Input Image :  torch.Size([2, 3, 180, 320])
Input Actions :  torch.Size([2, 6, 2])
Conv output :  torch.Size([2, 48640])
FC output :  torch.Size([1, 2, 128])
Action FC Output :  torch.Size([2, 6, 16])
ERROR HAPPENS HERE

Thanks

nn.LSTM expects the a tuple with the hidden_state and cell_state as the second argument, so this code should work:

x, hidden = self.lstm(actions, (torch.zeros(1, 2, 128), torch.zeros(1, 2, 128)))

Unrelated to this problem, but note that you are reassigning x and this the output of self.input_fc will be overridden.