How to initialize mean and variance of BatchNorm2d?

han.liu · October 11, 2019, 5:46am

I’m transforming a TensorFlow model to Pytorch. And I’d like to initialize the mean and variance of BatchNorm2d using TensorFlow model.
I’m doing it in this way:

bn.running_mean = torch.nn.Parameter(torch.Tensor(TF_param))

And I get this error:

RuntimeError: the derivative for 'running_mean' is not implemented

But is works for bn.weight and bn.bias. Is there any way to initialize the mean and variance using my pre-trained Tensorflow model? Is there anything like moving_mean_initializer and moving_variance_initializer in Pytorch?
Thanks!

ptrblck · October 11, 2019, 5:51am

Could you try to assign a torch.tensor instead of an nn.Parameter, since the running estimates do not require gradients?

han.liu · October 11, 2019, 6:11am

WoW! It worked! Thank you! By the way, if I freeze bn parameters (stop bn from updating), the running_mean and running_var will not change. The running_mean and running_var will be saved in model directly. Am I right? Sorry for bad English expression.

han.liu · October 11, 2019, 6:16am

WoW! It worked! Thank you! By the way, if I freeze bn parameters (stop bn from updating), the running_mean and running_var will not change. The running_mean and running_var will be saved in model directly. Am I right? Sorry for bad English expression.

ptrblck · October 11, 2019, 6:23am

It depends on what you mean by “freezing”.
To use the running estimates without updating, you could simply call .eval() on the batchnorm layer.
If you would like to freeze the affine parameters (weight and bias), you would need to set their requires_grad attribute to False.

han.liu · October 11, 2019, 6:40am

Thanks. I meant requires_grad.