Implementing keras CNN and lstm in PT not improving

rajivg · July 31, 2017, 4:03pm

Hi,

I have been trying to implement some keras code for ages in pytorch. The keras code is as follows:
There’s an embedding layer first then:

model.add(Dropout(0.2))
model.add(Convolution1D(100, 10, activation=‘relu’))
model.add(MaxPooling1D(4, 4))
model.add(Dropout(0.2))
model.add(Convolution1D(100, 8, activation=‘relu’))
model.add(MaxPooling1D(2, 2))
model.add(Dropout(0.2))
model.add(Convolution1D(80, 8, activation=‘relu’))
model.add(MaxPooling1D(2, 2))
model.add(Dropout(0.2))
model.add(Bidirectional(LSTM(80, consume_less=‘gpu’),merge_mode=‘concat’))
model.add(Dropout(0.2))
model.add(Dense(20, activation=‘relu’))
model.add(Dropout(0.5))
model.add(Dense(1, activation=‘sigmoid’))
model.compile(loss=‘binary_crossentropy’, optimizer=‘rmsprop’, metrics=[‘accuracy’])

My PyTorch code is as follows:

class Net(nn.Module):
def init(self):
super(Net, self).init()

    self.dropout_one = nn.Dropout(p=0.2)

    self.conv1 = nn.Conv1d(100,100,10)
    self.pool1 = nn.MaxPool1d(4, 4)
    
    self.conv2 = nn.Conv1d(100,100,8)
    self.pool2 = nn.MaxPool1d(2,2)
    
    self.conv3 = nn.Conv1d(100,80,8)
    
    self.bdlstm = nn.LSTM(input_size=80,hidden_size=80, num_layers=1, bidirectional=True, batch_first=True)
            
    self.dropout_two = nn.Dropout(p=0.5)
    self.fc1 = nn.Linear(160,20)
    self.fc2 = nn.Linear(20,1)
    
def forward(self,x):
    
    x = self.dropout_one(x)
    x = self.conv1(x)
    x = F.relu(x)
    x = self.pool1(x)
    x = self.dropout_one(x)
    
    x = self.conv2(x)
    x = F.relu(x)
    x = self.pool2(x)
    x = self.dropout_one(x)
    
    x = self.conv3(x)
    x = F.relu(x)
    x = self.pool2(x)
    x = self.dropout_one(x)

    x,hidden=self.bdlstm(x.permute(0,2,1))
    x = x[:,-1,:]
    x=x.view(-1,160)
    x = self.dropout_one(x)

    x = self.fc1(x)
    x = F.relu(x)
    x = self.dropout_two(x)
    
    x = self.fc2(x)
    x = F.sigmoid(x)
    
    return x

with
criterion = nn.BCELoss()
optimiser = optim.RMSprop(net.parameters(), lr=0.001, momentum=0.9)

Weights update in the pytorch model but the accuracy just hops around 0.5, whereas in the keras it does improve. Can anyone see what I’m doing wrong (I suspect the BDLSTM?)

Thank you!

rajivg · August 1, 2017, 8:28am

Update - I tried changing the criterion algorithm to SGD (no weights changed) and then Adam (weights change and it’s actually learning)

Is there an issue with either of these optimisations or am I missing something?

QuantScientist · September 5, 2017, 8:35pm

I am doing something similar with Conv1d, I would love to see the data input you used and its dimension. My code is here:

https://github.com/QuantScientist/Deep-Learning-Boot-Camp/blob/master/day%2002%20PyTORCH%20and%20PyCUDA/PyTorch/31-PyTorch-using-CONV1D-on-one-dimensional-data.ipynb