I’m training a LSTM model using Adam optimizer. The input size is (10000,48), and the output size is (10000,16). Here is the original code keras model:
keras_model = Sequential()
keras_model.add(Embedding(16, 10, input_length=48))
keras_model.add(CuDNNLSTM(50))
keras_model.add(Dropout(0.1))
keras_model.add(Dense(16, activation='sigmoid'))
keras_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
train_history = keras_model.fit(X_train,
y_train,
epochs=20,
verbose=1,
shuffle=False,
batch_size=256)
The following is the PyTorch model:
def hard_sigmoid(x):
"""
Computes element-wise hard sigmoid of x.
"""
x = (0.2 * x) + 0.5
x = F.threshold(-x, -1, -1)
x = F.threshold(-x, 0, 0)
return x
class LSTM(nn.Module):
def __init__(self):
super().__init__()
self.embedding = nn.Embedding(16, 10)
self.lstm = nn.LSTM(10, 50)
self.dropout = nn.Dropout(0.1)
self.fc = nn.Linear(50, 16)
def forward(self, x):
x = self.embedding(x)
x, _ = self.lstm(x, None)
x = x[:,-1,:]
x = self.dropout(x)
x = self.fc(x)
x = hard_sigmoid(x)
return x
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = LSTM(10, 50, 16, 16).to(device)
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters())
training_set = TensorDataset(X_train, y_train)
train_loader = DataLoader(training_set, batch_size=256, shuffle=False)
test_set = TensorDataset(X_test, y_test)
test_loader = DataLoader(test_set, batch_size=256, shuffle=False)
def Train_LSTM(num_epochs):
hist = np.zeros(num_epochs)
for t in range(num_epochs):
running_loss = 0.0
model.train()
for inputs, labels in train_loader:
optimizer.zero_grad()
y_pred = model(inputs)
loss = criterion(y_pred, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
print("Epoch ", t+1, "Training loss: ", running_loss/len(train_loader))
hist[t] = running_loss
However, the training loss for these two models differ. Here is the result for Keras model:
Epoch 1/20 loss: 0.3668 - acc: 0.8220
Epoch 2/20 loss: 0.3186 - acc: 0.8366
Epoch 3/20 loss: 0.2893 - acc: 0.8529
Epoch 4/20 loss: 0.2680 - acc: 0.8673
Epoch 5/20 loss: 0.2461 - acc: 0.8818
Epoch 6/20 loss: 0.2292 - acc: 0.8918
Epoch 7/20 loss: 0.2153 - acc: 0.8994
Epoch 8/20 loss: 0.2037 - acc: 0.9053
Epoch 9/20 loss: 0.1955 - acc: 0.9098
Epoch 10/20 loss: 0.1884 - acc: 0.9134
Epoch 11/20 loss: 0.1804 - acc: 0.9173
Epoch 12/20 loss: 0.1749 - acc: 0.9198
Epoch 13/20 loss: 0.1705 - acc: 0.9218
Epoch 14/20 loss: 0.1655 - acc: 0.9242
Epoch 15/20 loss: 0.1600 - acc: 0.9267
Epoch 16/20 loss: 0.1580 - acc: 0.9275
Epoch 17/20 loss: 0.1537 - acc: 0.9292
Epoch 18/20 loss: 0.1506 - acc: 0.9306
Epoch 19/20 loss: 0.1485 - acc: 0.9315
Epoch 20/20 loss: 0.1442 - acc: 0.9332
Here is the result for PyTorch model:
Epoch 1 Training loss: 0.3904338072785331
Epoch 2 Training loss: 0.3710213891228142
Epoch 3 Training loss: 0.3628670514163459
Epoch 4 Training loss: 0.3596793907453947
Epoch 5 Training loss: 0.35954090678478445
Epoch 6 Training loss: 0.35725244792068706
Epoch 7 Training loss: 0.3549379930852929
Epoch 8 Training loss: 0.35340563144982623
Epoch 9 Training loss: 0.35231793117340265
Epoch 10 Training loss: 0.349751780214517
Epoch 11 Training loss: 0.3518899157452766
Epoch 12 Training loss: 0.3493136685065296
Epoch 13 Training loss: 0.3415819657656848
Epoch 14 Training loss: 0.33805217672034604
Epoch 15 Training loss: 0.32990270841609487
Epoch 16 Training loss: 0.3201482021595206
Epoch 17 Training loss: 0.3118191622483456
Epoch 18 Training loss: 0.305637656117949
Epoch 19 Training loss: 0.29569833011121094
Epoch 20 Training loss: 0.3000877904884346
I’m not sure how to fix this problem. Any suggestion would be helpful.