Hey guys. I have checked similar posts on this matter & tried to dumb it down as much as possible, spent a few days, but still canβt figure this out. So would appreciate your help.
I was remaking a simple Sequential neural net from Tensorflow into PyTorch for binary text sentiment classification.
I dumbed it down to 5 samples of encoded and padded text.
Init variables code for both
batch_size = 5
num_epochs = 20
n_embeddings = 3000
embedding_dim = 16
X = [[2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 8, 20, 21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [22, 23, 24, 8, 25, 26, 27, 28, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [29, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 8, 42, 43, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
y = [[1], [1], [1], [1], [1]]
TensorFlow code
import numpy as np
import tensorflow as tf
#----------------MODEL----------------
model = tf.keras.Sequential([
tf.keras.layers.Embedding(n_embeddings, embedding_dim),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(1, activation='sigmoid')
])
#----------------MODEL----------------
#----------------OPTIMIZER & LOSS----------------
model.compile(loss='binary_crossentropy',
optimizer=tf.keras.optimizers.Adam()
)
#----------------OPTIMIZER & LOSS----------------
#----------------DATA----------------
# Prepare the data
train_x_prepared = np.array(X)
train_y_prepared = np.array(y)
print('The data is prepared for training!\n')
#----------------DATA----------------
#----------------TRAINING----------------
print('Training:')
history = model.fit(train_x_prepared, train_y_prepared, batch_size=batch_size, epochs=num_epochs)
#----------------TRAINING----------------
PyTorch code
import torch
from torch import optim
#----------------MODEL----------------
class Net(nn.Module):
def __init__(self):
super().__init__()
self.embedding = nn.Embedding(n_embeddings, embedding_dim)
self.pooling = nn.AdaptiveAvgPool1d(1)
self.fc = nn.Linear(embedding_dim, 1)
self.activation = nn.Sigmoid()
def forward(self, x):
x = self.embedding(x)
x = x.permute(0, 2, 1)
x = self.pooling(x)
x = x.squeeze(2)
x = self.fc(x)
x = self.activation(x)
return x
torch_model = Net()
#----------------MODEL----------------
#----------------OPTIMIZER & LOSS----------------
criterion = nn.BCELoss()
optimizer = optim.Adam(torch_model.parameters(), eps=1e-07)
#----------------OPTIMIZER & LOSS----------------
#----------------DATA----------------
torch_train_x_prepared = torch.tensor(X).long()
torch_train_y_prepared = torch.tensor(y).float()
print('The data is prepared for training!\n')
#----------------DATA----------------
#----------------TRAINING----------------
print('Training:')
for epoch in range(num_epochs):
running_loss = 0.0
for i in range(0, len(torch_train_x_prepared), batch_size):
batch_x = torch_train_x_prepared[i:i+batch_size]
batch_y = torch_train_y_prepared[i:i+batch_size]
optimizer.zero_grad()
outputs = torch_model(batch_x)
loss = criterion(outputs, batch_y)
loss.backward()
optimizer.step()
running_loss += loss.item()
print(f"Epoch: {epoch+1}/{num_epochs}, loss: {running_loss / (len(torch_train_x_prepared) / batch_size)}")
print("Training is finished")
#----------------TRAINING----------------
TensorFlow results
Training:
Epoch 1/20
1/1 ββββββββββββββββββββ 1s 787ms/step - loss: 0.6921
Epoch 2/20
1/1 ββββββββββββββββββββ 0s 23ms/step - loss: 0.6723
Epoch 3/20
1/1 ββββββββββββββββββββ 0s 22ms/step - loss: 0.6530
Epoch 4/20
1/1 ββββββββββββββββββββ 0s 28ms/step - loss: 0.6340
Epoch 5/20
1/1 ββββββββββββββββββββ 0s 22ms/step - loss: 0.6153
Epoch 6/20
1/1 ββββββββββββββββββββ 0s 21ms/step - loss: 0.5970
Epoch 7/20
1/1 ββββββββββββββββββββ 0s 21ms/step - loss: 0.5789
Epoch 8/20
1/1 ββββββββββββββββββββ 0s 21ms/step - loss: 0.5612
Epoch 9/20
1/1 ββββββββββββββββββββ 0s 23ms/step - loss: 0.5437
Epoch 10/20
1/1 ββββββββββββββββββββ 0s 23ms/step - loss: 0.5266
Epoch 11/20
1/1 ββββββββββββββββββββ 0s 22ms/step - loss: 0.5097
Epoch 12/20
1/1 ββββββββββββββββββββ 0s 26ms/step - loss: 0.4932
Epoch 13/20
1/1 ββββββββββββββββββββ 0s 27ms/step - loss: 0.4770
Epoch 14/20
1/1 ββββββββββββββββββββ 0s 21ms/step - loss: 0.4612
Epoch 15/20
1/1 ββββββββββββββββββββ 0s 21ms/step - loss: 0.4457
Epoch 16/20
1/1 ββββββββββββββββββββ 0s 22ms/step - loss: 0.4305
Epoch 17/20
1/1 ββββββββββββββββββββ 0s 22ms/step - loss: 0.4157
Epoch 18/20
1/1 ββββββββββββββββββββ 0s 21ms/step - loss: 0.4013
Epoch 19/20
1/1 ββββββββββββββββββββ 0s 22ms/step - loss: 0.3872
Epoch 20/20
1/1 ββββββββββββββββββββ 0s 22ms/step - loss: 0.3735
PyTorch results
Training:
Epoch: 1/20, loss: 0.7581332325935364
Epoch: 2/20, loss: 0.7515153884887695
Epoch: 3/20, loss: 0.7449377775192261
Epoch: 4/20, loss: 0.738400936126709
Epoch: 5/20, loss: 0.7319058179855347
Epoch: 6/20, loss: 0.7254530191421509
Epoch: 7/20, loss: 0.719042956829071
Epoch: 8/20, loss: 0.712676465511322
Epoch: 9/20, loss: 0.706354022026062
Epoch: 10/20, loss: 0.7000761032104492
Epoch: 11/20, loss: 0.6938431859016418
Epoch: 12/20, loss: 0.6876559257507324
Epoch: 13/20, loss: 0.6815144419670105
Epoch: 14/20, loss: 0.6754195690155029
Epoch: 15/20, loss: 0.6693712472915649
Epoch: 16/20, loss: 0.6633699536323547
Epoch: 17/20, loss: 0.6574161648750305
Epoch: 18/20, loss: 0.6515097618103027
Epoch: 19/20, loss: 0.6456514596939087
Epoch: 20/20, loss: 0.6398409605026245
P.S. I understand that the starting epoch 1 loss can vary due to random weight initialization, but notice the convergence, TensorFlow Adam for some reason converges much faster, and this is only a test on 5 samples, with a practical sample the difference is insane.