RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 4 and 3 in dimension 2. My inputs are numpy files that I converted to tensor

peony · January 6, 2023, 5:00am

I am getting this error when I try batch size more than 1

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 5 and 4 in dimension 2 at C:\w\1\s\tmp_conda_3.6_095855\conda\conda-bld\pytorch_1579082406639\work\aten\src\TH/generic/THTensor.cpp:612

All the solution I found here and in StackOverflow talk about images as inputs, but my inputs are numpy arrays (.npy) that I converted to tensors separately. Here is my code for the conversion:

 tensor1 = torch.from_numpy(np.load(NPY))
 tensor1 = tensor1.type(torch.FloatTensor)
 tensor1 = torch.einsum('h w c -> c h w', tensor1)

I understand that by default numpy data is given as

[batch_size, depth, height, width, channels]

however torch tensors have this dimension

[batch_size, channels, depth, height, width].

From this, I am guessing dimension 2 refers to depth.
Is there any way I can make the sizes of the tensors in dimension 2 similar? If so, how can I do this with numpy arrays?

Here is the code of the model I am experimenting with for reference:

        elif arch.startswith('resnet50'):
            self.features = nn.Sequential(*list(original_model.children())[:-1])
            
            for i, param in enumerate(self.features.parameters()):
               
                param.requires_grad = False
            self.fc_pre = nn.Sequential(nn.Linear(2048, fc_size), nn.Dropout())
            self.rnn = nn.LSTM(input_size = fc_size,
                        hidden_size = hidden_size,
                        num_layers = lstm_layers,
                        batch_first = True)
            self.fc = nn.Linear(hidden_size, num_classes)
            self.modelName = 'resnet50_lstm'

        else:
            raise Exception("This architecture has not been supported yet")

    def init_hidden(self, num_layers, batch_size):
        return (torch.zeros(num_layers, batch_size, self.hidden_size).cuda(),
                torch.zeros(num_layers, batch_size, self.hidden_size).cuda())

    def forward(self, inputs, hidden=None, steps=0):
        length = len(inputs)
        fs = torch.zeros(inputs[0].size(0), length, self.rnn.input_size).cuda()
        
        for i in range(length):
            
            f = self.features(inputs[i])
            f = f.view(f.size(0), -1)
            f = self.fc_pre(f)
            fs[:, i, :] = f

        outputs, hidden = self.rnn(fs, hidden)
        outputs = self.fc(outputs)
        return outputs

Thank you.