CNN-LSTM model implementation issues

Hi, I’m trying to implement CNN-LSTM model where I have a sequence of images that I need to get spatial data from using CNN and send it to my LSTM layer. Here is the code of the models that I have written. Is it the write implementation of the idea? I’m not able to properly train the model that is why I’m asking the question.

class CNN(nn.Module):
    def __init__(self, layers, c):
        super(CNN, self).__init__()
        self.layers = nn.ModuleList([
            nn.Conv2d(layers[i], layers[i + 1], kernel_size=3, stride=2)
            for i in range(len(layers) - 1)])
        self.drop = nn.Dropout(p=0.1)
        self.pool = nn.AdaptiveMaxPool2d(1)
        self.out = nn.Linear(layers[-1], c)
    
    def forward(self, x):
        for l in self.layers: x = F.relu(l(x))
        x = self.drop(x)
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        return F.log_softmax(self.out(x))

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        
        self.hidden_size = hidden_size
        self.conv = CNN([3, 20, 40, 80], hidden_size)
        self.i2h = nn.LSTM(hidden_size, hidden_size)
        self.i2o = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        input = self.conv(input)
        output, hidden = self.i2h(torch.unsqueeze(input, 0), hidden)
        output = torch.squeeze(output, 0)
        output = self.i2o(output)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return (Variable(torch.zeros(1, 1, self.hidden_size).cuda()),
                Variable(torch.zeros(1, 1, self.hidden_size).cuda()))

My data is the ellipses (one type of ellipse) with parameters like colour, width, height and with two different movement patterns that are described by parameters like x, y change. Parameters are randomly generated using Gaussian normal distribution. Each move has different parameters for Gaussian distribution. I have 40.000 (20.000 for each) 64x64 images. Still, I’m not able to train the model. Maybe there is too much white space on the images? Therefore, I get overfitting?!

Hi Rasul.

I’m starting on the same task (seq. of images into a CNN-LSTM model).
Did you solve your problems and got a working model?