Varying batch size for pre-trained model changes inference result, even in evaluation mode

Well, here’s hoping this behavior is something obvious…Attempting to extract some features using a pre-trained ResNet50 model, which despite being in evaluation mode, is producing different outputs for the same input, if submitted in a larger batch. Tried a couple different versions 2.5.1 and 2.9.0, came up with a minimum reproducible code, switched to AlexNet to avoid potential issues with batch normalization layers, and dropped GPU-acceleration in case of CUDA derived shenanigans. Any insight into how to fix and why result and result2 are not identical would be greatly appreciated!

import torch
from torchvision import models

#Setup deterministic behavior
torch.use_deterministic_algorithms(True)

#Set torch seed for consistency
torch.manual_seed(0)

#Create batch of random input data
batch = torch.randn(32, 3, 224, 224)

#Load model and set to evaluation mode
model = models.alexnet(weights='DEFAULT')
model.eval()

#Obtain result for first image, inferencing only it
with torch.no_grad(): result = model(torch.unsqueeze(batch[0], 0)).detach()[0]

#Obtain result for first image, inferencing the whole batch
with torch.no_grad(): result2 = model(batch).detach()[0]

#Results should be identical when model is in eval mode...?
print('Sum Absolute Difference: ', torch.sum(torch.abs(result-result2)).item())

Small numerical errors caused by the limited floating point precision are expected as these depend on the used algorithm and order of operations used to compute the results. Bitwise identical results are achieved for the same setups which includes the workload with its input shapes when deterministic mode is enabled.