I have model with 3 LSTM layers and one full connected layer, I use MPS on MacBook Air with M2 processor.

I have set batch size of train dataloader at 64, and I have different results of model performance evaluation when I set batch size of test dataloader at 64 and 256, although I have set model in eval() mode

Training model code:

```
for epoch in range(epochs):
model.train()
print(f'Epoch {epoch}')
__train_loop(model, train_dataloader, loss_function, optimizer, scheduler, verbose, device=device)
if test_per_epoch:
model.eval()
train_loss, train_accuracy, train_f_score = test_model(model, train_dataloader, loss_function, device=device)
```

```
def test_model(model, test_dataloader: DataLoader, loss_function, device='cpu'):
loss = 0
y_pred_all = []
y_all = []
with progressbar.ProgressBar(max_value=(len(test_dataloader))) as bar:
with torch.no_grad():
for batch_id, (X, y) in enumerate(test_dataloader):
X, y = X.to(device), y.to(device)
y_pred = model(X)
loss += loss_function(y_pred, y.to(device)).item()
y_pred = torch.argmax(y_pred, dim=1)
y_pred_all.append(y_pred.cpu().numpy())
y_all.append(y.cpu().numpy())
bar.update(batch_id)
loss = 0
y_pred_all = np.hstack(y_pred_all).flatten()
y_all = np.hstack(y_all).flatten()
cr = classification_report(y_all, y_pred_all, output_dict=True)
f_score = cr['macro avg']['f1-score']
accuracy = cr['accuracy']
return loss, accuracy, f_score
```

Model:

```
(lstm): LSTM(3, 25, num_layers=3, batch_first=True, dropout=0.7)
(dense): Linear(in_features=25, out_features=2, bias=True)
```

For loss function I use `torch.nn.CrossEntropyLoss()`

. For optimizer I use `torch.optim.Adam()`

learning rate is 0.001 with `ExponentialLR(gamma=0.95)`

scheduler

Model performance after 6th epoch when batch size of test dataloader is 64

Train accuracy: 0.9087542087542088

Train F-Score: 0.908342717859852

Test accuracy: 0.8906356801093643

Test F-Score: 0.8901527949844201

Model performance after 6th epoch when batch size of test dataloader is 256

Train accuracy: 0.8942760942760942

Train F-Score: 0.8939527432555182

Test accuracy: 0.5974025974025974

Test F-Score: 0.5557444197838286

There is no such effect on Nvidia CUDA graphic cards, so I created this topic in MPS category. Why it works like that? Thanks a lot in advance for answer!