How to formulate the batchnorm calculation into a linear transformation?

This is the current batchnorm calculation:
y = \frac{x - mean[x]}{ \sqrt{Var[x] + \epsilon}} * gamma + beta

I want to formulate it as y=kx+b(as is shown in the picture below).

I am wandering how can I get the value of Var[x] and mean[x]?? Is using model.state_dict().values() function a good idea? But how can I use it while training a model? Do you have any examples to show me?

I know it’s a weird question, but if you have some suggestions, please let me know.

Thank you very much.1

Do you want to use the parameters from another BatchNorm layer and just add another layer on top of it or do you want to rewrite it completely?
In the former case you could try something like this:

class MyBatchNorm(nn.Module):
    def __init__(self, num_features, eps=1e-05, momentum=0.1, affine=True):
        super(MyBatchNorm, self).__init__() = nn.BatchNorm1d(num_features,

    def forward(self, x):        
        x =
        mu =
        var =
        gamma =
        beta =
        eps =    
        k = / torch.sqrt(var + eps) = k * +
        return x

mybn = MyBatchNorm(10)
x = Variable(torch.randn(16, 10))
x_ = mybn(x)

Note that I calculated mu, var, ... separately just to show how to get them. Of course you can simplify the code and just call e.g. for your calculations.

Let me know, if this meets your need or if I misunderstood your question.

1 Like

when I run the code, I got this error:
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/…/generic/

Do you know where the problem is??
Thanks a lot

Which Pytorch version are you using?
Could you check it with print(torch.__version__)?
Maybe your Pytorch version is a bit older and doesn’t support broadcasting yet.

I am using 0.3.0.post4. I think it is the latest Pytorch version I can get now.

Besides, I got another question quite confused me.
When I use

for child in model.named_children():
layer_name = child[0]
layer_params = {}
for param in child[1].named_parameters():
param_name = param[0]
param_value = param[1].data.numpy()
layer_params[param_name] = param_value
save_name = layer_name + '.npy’,layer_params)

to save the parameters, I get gamma with 8 decimal places, and when using gamma, I can only get gamma with 4 decimal places. How’s that??

My English is not so good, so let me know if you are confused by my questions.

Thank you very much.

The Tensor and numpy array are using the same data, so it’s just a representation issue.
Try torch.set_printoptions(precision=10) and print gamma again. :wink:

Also, could you give me the line throwing the RuntimeError?

1 Like

Oh, I see, that’s huge help! Thank you so much.
And the line throwing the error is: = k * +
I comment out the line and it seems fine.

self.BinarizedConv2d2 = BinarizeConv2d(128, 128, kernel_size=3, padding=1, bias=False)
self.MaxPool2d2 = nn.MaxPool2d(kernel_size=2, stride=2)
self.BatchNorm2d2 = BatchNorm(128)
self.Hardtanh2 = nn.Hardtanh(inplace=True)
This is part of my model. Could you tell me how to use the bn.running_mean directly on the model, please?

Again, you have been great help to me.Thank you so much.

You’re welcome :wink:

Hmm, commenting out the line is not the solution, since it’s the formula you are looking for. :smiley:
Could you please print the shapes of all Tensors?

The quoted code seems to be the __init__ function of your Module?
Try to adapt my code snippet into the forward function.

Yes ,you are right. The quoted code is the _init_function of my model.

I defined some of the layers in one file, and call these layers in another file.

This is how I use your code, is it right?

And do I need to print the shape of tensors in all layers or just tensors in BatchNorm layers? Or could you please tell me the specific tensors I need to check? I am quite confused now.

Appreciate your help

I changed the = nn.BatchNorm1d
into = nn.BatchNorm2d

Could this change cause the mismatch?

BN is a nonlinear function.

It may have something to do with my input??

I think I need to use the Conv2d layer’s output as the input of the BatchNorm2d layer. What do you think?

Thank you for your reply.

Yes, you should use the output of your conv layers as the input to your batch norm.
It should also work with BatchNorm2d, if the shapes are right.
Could you print the shapes of all Tensors used in the calculation which causes the error?

Is this what you need to check? I’m afraid I misunderstand what your mean.


BTW, the output of Conv2d is also (30L, 128L, 32L, 32L)

I see, it was my mistake. Haven’t checked the sizes properly.

Try to change the calculation to this line: = k.view(1, -1, 1, 1).expand_as(x) * + beta.view(1, -1, 1, 1).expand_as(x).data

I’m sure it’s not the optimal way to calculate it, so maybe someone can propose a better way.

@SimonW Yeah you are right! The proposed calculations in the first post should work however, even though BN is nonlinear, or am I missing something?

1 Like

Thank you very much, I’ll try this~

It works!!!

@ptrblck Oh I’m sorry. I was just answering the question in title. No intention to say that your code is wrong :slight_smile:

1 Like

@SimonW No worries! I overlooked the title more or less and was wondering if my solution still makes any sense. :wink:

@LJ_Mason You’re welcome! :slight_smile:

1 Like

I have the same question about the batchnorm , I also need to extracte the parameters in the batchnorm. But i looked at the method you wrote MyBatchNorm, the final calculation of x.dta is incomplete, if compared with this batchnorm caluation y = \frac{x - mean[x]}{ \sqrt{Var[x] + \epsilon}} * gamma + beta , the lack of part of this {- mean[x] * gamma}{\sqrt{Var[x] + \epsilon}}. I added the program as shown in the figure below, while the program runs without error, but the cause of network loss is very serious, and so i think it is wrong to modify if, to ask you how to rewrite this part of the expression of the code, for batchnorm1d and batchnorm2d, should respectively how to write, I hope you can give answer, thank you.