I train my model with batch size 128, however if I don’t use batch in the evaluation phase, the network output is wrong.

If the network’s input is in batches:

crit = nn.MSELoss(reduction='mean')
target = []
netout = []
model.eval() # To handle drop out layers and batch norm
for A, M, label in dataloaders['val']:
with tr.set_grad_enabled(False): # We don't need gradient computation in eval mode (speed up)
out = model(A, M)
target.extend(label.data.cpu().numpy())
netout.extend(out.data.cpu().numpy())
print(tr.Tensor(target).shape, tr.Tensor(netout).shape)
print(f'Loss: {crit(tr.Tensor(target), tr.Tensor(netout))}')

If I iterate through the single elements, I attain different result:

crit = nn.MSELoss(reduction='mean')
target = []
netout = []
model.eval() # To handle drop out layers and batch norm
for A, M, label in testset:
with tr.set_grad_enabled(False): # We don't need gradient computation in eval mode (speed up)
out = model(tr.unsqueeze(A, 0), tr.unsqueeze(M, 0))
target.append(label.item())
netout.append(out.item())
print(tr.Tensor(target).shape, tr.Tensor(netout).shape)
print(f'Loss: {crit(tr.Tensor(target), tr.Tensor(netout))}')

But the result should be exactly the same, because the output is deterministic!
What could be the problem? I can overfit the model for a specific batch size, but if I evaluate on single elements, then the result is wrong.

Could you print the shape of tr.Tensor(target) and tr.Tensor(netout) before passing them to the criterion?
Also, which criterion are you using at the moment?

It seems you might be accidentally broadcasting the inputs to your criterion.
In the latest PyTorch version (1.2.0) you should get a warning.
Make sure to pass the input and target as [batch_size, 1] or [batch_size] (not mixed).

I have already spent a lot of time on this problem. My criterion CrossEntropyLoss. The input is [batch, num_class], the target is [batch]. Everything works well on batch training and testing. On a single package does not work. I think that NN sees the data inside the batch. We have to train the model with the size of the batch = 1.