Implementing keras CNN and lstm in PT not improving


I have been trying to implement some keras code for ages in pytorch. The keras code is as follows:
There’s an embedding layer first then:

model.add(Convolution1D(100, 10, activation=‘relu’))
model.add(MaxPooling1D(4, 4))
model.add(Convolution1D(100, 8, activation=‘relu’))
model.add(MaxPooling1D(2, 2))
model.add(Convolution1D(80, 8, activation=‘relu’))
model.add(MaxPooling1D(2, 2))
model.add(Bidirectional(LSTM(80, consume_less=‘gpu’),merge_mode=‘concat’))
model.add(Dense(20, activation=‘relu’))
model.add(Dense(1, activation=‘sigmoid’))
model.compile(loss=‘binary_crossentropy’, optimizer=‘rmsprop’, metrics=[‘accuracy’])

My PyTorch code is as follows:

class Net(nn.Module):
def init(self):
super(Net, self).init()

    self.dropout_one = nn.Dropout(p=0.2)

    self.conv1 = nn.Conv1d(100,100,10)
    self.pool1 = nn.MaxPool1d(4, 4)
    self.conv2 = nn.Conv1d(100,100,8)
    self.pool2 = nn.MaxPool1d(2,2)
    self.conv3 = nn.Conv1d(100,80,8)
    self.bdlstm = nn.LSTM(input_size=80,hidden_size=80, num_layers=1, bidirectional=True, batch_first=True)
    self.dropout_two = nn.Dropout(p=0.5)
    self.fc1 = nn.Linear(160,20)
    self.fc2 = nn.Linear(20,1)
def forward(self,x):
    x = self.dropout_one(x)
    x = self.conv1(x)
    x = F.relu(x)
    x = self.pool1(x)
    x = self.dropout_one(x)
    x = self.conv2(x)
    x = F.relu(x)
    x = self.pool2(x)
    x = self.dropout_one(x)
    x = self.conv3(x)
    x = F.relu(x)
    x = self.pool2(x)
    x = self.dropout_one(x)

    x = x[:,-1,:]
    x = self.dropout_one(x)

    x = self.fc1(x)
    x = F.relu(x)
    x = self.dropout_two(x)
    x = self.fc2(x)
    x = F.sigmoid(x)
    return x

criterion = nn.BCELoss()
optimiser = optim.RMSprop(net.parameters(), lr=0.001, momentum=0.9)

Weights update in the pytorch model but the accuracy just hops around 0.5, whereas in the keras it does improve. Can anyone see what I’m doing wrong (I suspect the BDLSTM?)

Thank you!

Update - I tried changing the criterion algorithm to SGD (no weights changed) and then Adam (weights change and it’s actually learning)

Is there an issue with either of these optimisations or am I missing something?

I am doing something similar with Conv1d, I would love to see the data input you used and its dimension. My code is here: