I have model with 3 LSTM layers and one full connected layer, I use MPS on MacBook Air with M2 processor.
I have set batch size of train dataloader at 64, and I have different results of model performance evaluation when I set batch size of test dataloader at 64 and 256, although I have set model in eval() mode
Training model code:
for epoch in range(epochs):
model.train()
print(f'Epoch {epoch}')
__train_loop(model, train_dataloader, loss_function, optimizer, scheduler, verbose, device=device)
if test_per_epoch:
model.eval()
train_loss, train_accuracy, train_f_score = test_model(model, train_dataloader, loss_function, device=device)
def test_model(model, test_dataloader: DataLoader, loss_function, device='cpu'):
loss = 0
y_pred_all = []
y_all = []
with progressbar.ProgressBar(max_value=(len(test_dataloader))) as bar:
with torch.no_grad():
for batch_id, (X, y) in enumerate(test_dataloader):
X, y = X.to(device), y.to(device)
y_pred = model(X)
loss += loss_function(y_pred, y.to(device)).item()
y_pred = torch.argmax(y_pred, dim=1)
y_pred_all.append(y_pred.cpu().numpy())
y_all.append(y.cpu().numpy())
bar.update(batch_id)
loss = 0
y_pred_all = np.hstack(y_pred_all).flatten()
y_all = np.hstack(y_all).flatten()
cr = classification_report(y_all, y_pred_all, output_dict=True)
f_score = cr['macro avg']['f1-score']
accuracy = cr['accuracy']
return loss, accuracy, f_score
Model:
(lstm): LSTM(3, 25, num_layers=3, batch_first=True, dropout=0.7)
(dense): Linear(in_features=25, out_features=2, bias=True)
For loss function I use torch.nn.CrossEntropyLoss() . For optimizer I use torch.optim.Adam()
learning rate is 0.001 with ExponentialLR(gamma=0.95) scheduler
Model performance after 6th epoch when batch size of test dataloader is 64
Train accuracy: 0.9087542087542088
Train F-Score: 0.908342717859852
Test accuracy: 0.8906356801093643
Test F-Score: 0.8901527949844201
Model performance after 6th epoch when batch size of test dataloader is 256
Train accuracy: 0.8942760942760942
Train F-Score: 0.8939527432555182
Test accuracy: 0.5974025974025974
Test F-Score: 0.5557444197838286
There is no such effect on Nvidia CUDA graphic cards, so I created this topic in MPS category. Why it works like that? Thanks a lot in advance for answer!