CNN LSTM dimension error

Jam · March 10, 2021, 10:04am

I need to build a CNN LSTM model in Keras for video classification. However, before using the actual video data, I am supposed to build a testing model for the FashionMNIST dataset.

I first tested both models separetely and they were working. But when trying to combine them, I just can’t manage to fix the dimension shape of the output of the CNN. I tried so many different techniques but I just can’t make it work. Below you can find my model. I am getting the error when calling outputs = model(outputs) which calls the LSTM model with the outputs from the CNN model.

My initial shape of images is: [12, 1, 28, 28]

I then reshape it to: [12, 1, 28, 28] where 12 is batch_size * seq_dim (but I set seq_dim=1 for now)

The output shape of the CNN is: [12, 32, 7, 7] which I now need to reshape to [batch_size, seq_dim, input_dim]

class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()
        
        # Convolution 1
        self.cnn1 = nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2)
        self.relu1 = nn.ReLU()
        
        # Max pool 1
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)
     
        # Convolution 2
        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2)
        self.relu2 = nn.ReLU()
        
        # Max pool 2
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)
        
        # Fully connected 1 (readout)
        self.fc1 = nn.Linear(32 * 7 * 7, 10) 
    
    def forward(self, x):
        # Convolution 1
        out = self.cnn1(x)
        out = self.relu1(out)
        
        # Max pool 1
        out = self.maxpool1(out)
        
        # Convolution 2 
        out = self.cnn2(out)
        out = self.relu2(out)
        
        # Max pool 2 
        out = self.maxpool2(out)
        
        # Resize
        # Original size: (100, 32, 7, 7)
        # out.size(0): 100
        # New out size: (100, 32*7*7)
        out = out.view(out.size(0), -1)
        
        return out

class LSTMModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim):
        super(LSTMModel, self).__init__()
        self.cnn = CNNModel()
        # LSTM

        # Hidden dimensions
        self.hidden_dim = hidden_dim
        
        # Number of hidden layers
        self.layer_dim = layer_dim
        
        # Building your LSTM
        # batch_first=True causes input/output tensors to be of shape
        # (batch_dim, seq_dim, feature_dim)
        self.lstm = nn.LSTM(input_dim, hidden_dim, layer_dim, batch_first=True)
        
        # Readout layer
        self.fc = nn.Linear(hidden_dim, output_dim)
    
    def forward(self, x):
        # Initialize hidden state with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        
        # Initialize cell state
        c0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()
        
        out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
        
        out = self.fc(out[:, -1, :]) 
        return out


model = LSTMModel(input_dim, hidden_dim, layer_dim, output_dim)
model1 = CNNModel()

# Number of steps to unroll
seq_dim = 1

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as a torch tensor with gradient accumulation abilities
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits

        ''' x size: (batch_size, time_steps, in_channels, height, width) '''
        batch_size, C, H, W = images.size()
        images = images.view(batch_size * seq_dim, C, H, W)       

        outputs = model1(images)
        print(outputs.shape)
        outputs = outputs.view(batch_size, seq_dim, input_dim)

        outputs = model(outputs)

Dwight_Foster · March 10, 2021, 1:09pm

You can do

x = self.cnn(x)
x = x.view(x.shape[0], x.shape[1], -1)

Jam · March 10, 2021, 1:26pm

Hi,
Thanks for your reply! Where exactly do I need to put this?
If I replace

out = out.view(out.size(0), -1)

by

out = out .view(out .shape[0], out.shape[1], -1)

I still get a similar error:

shape ‘[12, 1, 28]’ is invalid for input of size 27648

Dwight_Foster · March 10, 2021, 1:38pm

Yes. What is the shape of out before you try and reshape it?

Jam · March 10, 2021, 1:42pm

Before reshaping inside the CNN model it’s [12, 64, 6, 6]. Then I reshape using your formula and get [12, 64, 36].

And the input for the LSTM should have [12,1,28] (batch_size, seq_dim, input_dim)

Dwight_Foster · March 10, 2021, 1:44pm

Ok then you will have to change your dimensions on the lstm to fit the cnn. So change input dim to 36 and seq dim to 64.

Jam · March 10, 2021, 2:22pm

But the dimensions I am using in the lstm are the sequence length I specified and input_dim shoudl correspond to the dimension of the images, no?
I tried adjusting the sizes for the lstm manually as you stated and it works for the training set, but e.g. for the test set, the output size of them CNN is [4, 1, 28, 28] and then it fails again.

Dwight_Foster · March 10, 2021, 2:53pm

Wait that doesn’t make sense. Why is the output of your model different for the test set. Are your images different sizes? Or do you change the model? That makes no sense.

Jam · March 10, 2021, 3:12pm

Sorry, you are right. I messed something else up with the test data. It works now. Thank you so much!!
Just one question. Now the input_dim for the LSTM is 36 (and doesn’t correspond to the dimension of the images anymore, which was 28). Is this okay?

Dwight_Foster · March 10, 2021, 3:14pm

I’m confused. Do you mean the original dimension of the image does not correspond to the input dim. Or is there an error because the input shape to the lstm is wrong? If the first thing is true then it should not be a problem as long as it learns.

Jam · March 10, 2021, 3:16pm

Sorry for not explaining myself correctly. But yes, I was referring to the first case. Thanks!!

Dwight_Foster · March 10, 2021, 3:21pm

Ok then yes it should not be a problem. If it trains then you are ok.

Jam · March 10, 2021, 3:22pm

Perfect, thank you for your help!