A layer doesn’t have requires_grad, only Variables have. running_mean and running_var are buffers, and are updated during forwarding. I assume train(True) will still use the batch mean and batch var.
A layer doesn’t have requires_grad, only Variables have. running_mean and running_var are buffers, and are updated during forwarding. I assume train(True) will still use the batch mean and batch var.