Same model, data, and code, different hardware and results

I have a pth model that I have trained and saved.

I load the model and run it on a test dataset with the following code:

for i, (image, label) in enumerate(dataloader):
      with torch.no_grad():
          output, min_distances = model(input)

On one machine, the model works perfectly, getting around 97% correct predictions.
I copied the model, code, and data to a raspberry pi. Now all the model predictions are from a single class. (first class of the dataset).

No changes were done to the code.
Example output:

Predicted  ActualClass
tensor([0]) tensor([0])
tensor([0]) tensor([0])
tensor([0]) tensor([0])
tensor([0]) tensor([0])
tensor([0]) tensor([0])
tensor([0]) tensor([0])
tensor([0]) tensor([0])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([1])
tensor([0]) tensor([2])
tensor([0]) tensor([2])
tensor([0]) tensor([2])
tensor([0]) tensor([2])
tensor([0]) tensor([2])

Any ideas why this might be happening?

Looks like this is a batch size problem. When setting batch size to 1, this problem occurs.
Is there a way to make predictions indifferent of batch size? or should it always be the same as training value?

The majority of layers are not depending on the batch size besides e.g. normalization layers, which calculate the batch stats and normalize the input activation with them during training. If you are using e.g. batchnorm layers, call model.eval() before evaluating your model to use the running stats instead of the batch stats.