Hi, I’m trying to implement CNN-LSTM model where I have a sequence of images that I need to get spatial data from using CNN and send it to my LSTM layer. Here is the code of the models that I have written. Is it the write implementation of the idea? I’m not able to properly train the model that is why I’m asking the question.
class CNN(nn.Module):
def __init__(self, layers, c):
super(CNN, self).__init__()
self.layers = nn.ModuleList([
nn.Conv2d(layers[i], layers[i + 1], kernel_size=3, stride=2)
for i in range(len(layers) - 1)])
self.drop = nn.Dropout(p=0.1)
self.pool = nn.AdaptiveMaxPool2d(1)
self.out = nn.Linear(layers[-1], c)
def forward(self, x):
for l in self.layers: x = F.relu(l(x))
x = self.drop(x)
x = self.pool(x)
x = x.view(x.size(0), -1)
return F.log_softmax(self.out(x))
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.conv = CNN([3, 20, 40, 80], hidden_size)
self.i2h = nn.LSTM(hidden_size, hidden_size)
self.i2o = nn.Linear(hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, input, hidden):
input = self.conv(input)
output, hidden = self.i2h(torch.unsqueeze(input, 0), hidden)
output = torch.squeeze(output, 0)
output = self.i2o(output)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return (Variable(torch.zeros(1, 1, self.hidden_size).cuda()),
Variable(torch.zeros(1, 1, self.hidden_size).cuda()))
My data is the ellipses (one type of ellipse) with parameters like colour, width, height and with two different movement patterns that are described by parameters like x, y change. Parameters are randomly generated using Gaussian normal distribution. Each move has different parameters for Gaussian distribution. I have 40.000 (20.000 for each) 64x64 images. Still, I’m not able to train the model. Maybe there is too much white space on the images? Therefore, I get overfitting?!