In that case there is some other problem, most probably with your data. Batchnorm by itself will not give nan for batch sizes greater than 1. Did you scale your data? If in your training you were using float in range 0-1 and in test if its int 0-65535, your network might blowup.