Why BatchNorm2D has inconsistent gradients and sizes for running_mean and running_variance


I have a question regarding BatchNorm2d, below is the output when i call optimizer.step() to update parameters. I can see that the conv1.weight tensor size is 64 but it should have been 64x64x3x3 (Number of Input Channels = 64)?

Resolved now.

There was some bug in the code.