i have two models that differ only by the batch size, meaning they have identical weight initialization and all other parameters.
i perform initial validation, which means before any training to the models i simply run inference.
the cross_entropy loss for batch size = 3 is 0.9632
the cross_entropy loss for batch size = 1 is 0.8576
i’m not using any batch norm layers in my model, so this has no effect.
what else can cause such behavior?