Getting different outputs from batchnorm network with different batch sizes at test time

Yes, as I said in my post.

The problem is related to the batch statistics. I would like to see if there is a good solution that works.