How can I get the mean and variance from a saved CNN model?

Saurav_Pawar · January 20, 2023, 1:01pm

Hi,

How can I get the mean, variance, gamma and beta for batchnorm from a saved CNN model?
Also, how can I use them for inference?

This is my architecture:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 32, 32, 32]             896
       BatchNorm2d-2           [-1, 32, 32, 32]              64
              ReLU-3           [-1, 32, 32, 32]               0
           Dropout-4           [-1, 32, 32, 32]               0
            Conv2d-5           [-1, 64, 32, 32]          18,496
       BatchNorm2d-6           [-1, 64, 32, 32]             128
              ReLU-7           [-1, 64, 32, 32]               0
         AvgPool2d-8           [-1, 64, 16, 16]               0
            Conv2d-9          [-1, 128, 16, 16]          73,856
      BatchNorm2d-10          [-1, 128, 16, 16]             256
             ReLU-11          [-1, 128, 16, 16]               0
          Dropout-12          [-1, 128, 16, 16]               0
           Conv2d-13          [-1, 128, 16, 16]         147,584
      BatchNorm2d-14          [-1, 128, 16, 16]             256
             ReLU-15          [-1, 128, 16, 16]               0
        AvgPool2d-16            [-1, 128, 8, 8]               0
           Conv2d-17            [-1, 256, 8, 8]         295,168
      BatchNorm2d-18            [-1, 256, 8, 8]             512
             ReLU-19            [-1, 256, 8, 8]               0
          Dropout-20            [-1, 256, 8, 8]               0
           Conv2d-21            [-1, 256, 8, 8]         590,080
      BatchNorm2d-22            [-1, 256, 8, 8]             512
             ReLU-23            [-1, 256, 8, 8]               0
        AvgPool2d-24            [-1, 256, 4, 4]               0
          Flatten-25                 [-1, 4096]               0
           Linear-26                   [-1, 32]         131,104
             ReLU-27                   [-1, 32]               0
           Linear-28                   [-1, 10]             330
================================================================
Total params: 1,259,242
Trainable params: 1,259,242
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 5.38
Params size (MB): 4.80
Estimated Total Size (MB): 10.19
----------------------------------------------------------------

ptrblck · January 20, 2023, 9:21pm

You can access the affine, trainable parameters via .weight and .bias and the running stats via .running_mean and .running_var.

Saurav_Pawar · January 25, 2023, 9:34am

Thanks for your response. That worked!
I have a saved pytorch CNN model. While doing the inference, I want to know, what is the exact input and output of the batch normalization layers?

This is my architecture:

class Cifar10CnnModel(ImageClassificationBase):
    def __init__(self):
        super().__init__()
        self.network = nn.Sequential(
            
            # Conv-1
            nn.Conv2d(3, 32, kernel_size=3, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.Dropout(0.25),
            
            # Conv-2
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.AvgPool2d(2, 2), # output: 64 x 16 x 16
            
            # Conv-3
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.Dropout(0.25),

            # Conv-4
            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(),
            nn.AvgPool2d(2, 2), # output: 128 x 8 x 8

            # Conv-5
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.Dropout(0.25),            
            
            # Conv-6
            nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),           
            nn.ReLU(),
            nn.AvgPool2d(2, 2), # output: 256 x 4 x 4

            nn.Flatten(), 
            nn.Linear(256*4*4, 32),
            nn.Dropout(0.25),            
            nn.ReLU(),
            nn.Linear(32, 10))
        
    def forward(self, xb):
        return self.network(xb)

I want to carry out a verification process that the running mean, running variance, gamma and beta are correct by calculating manually.

ptrblck · January 25, 2023, 9:56am

Check this manual implementation which shows how the layer execution is performed internally.

Saurav_Pawar · January 25, 2023, 10:41am

Thanks for your response. Do you have any similiar thing by using which I can get the input and output for batchnorm layers?

ptrblck · January 25, 2023, 7:18pm

Yes, forward hooks should work with my manual module implementation, too, and you should be able to reuse the provided code.

Saurav_Pawar · January 26, 2023, 6:00pm

Thank you. I was able to solve my solution and have got the output of each layer. I am also trying to verify the values manually and after doing that I can see that the precision is not matching. For eg. I tried calculating the values for batch normalization layer:

How can I get the exact precison?

ptrblck · January 26, 2023, 9:54pm

You won’t be able to get bitwise-identical results due to the expected errors caused by the limited floating point precision and a potentially different order of operations.
Compare the output errors against a small eps value or use allclose as was done in my code snippet.

Saurav_Pawar · January 27, 2023, 7:08pm

Okay, thanks!

In batch normalization, will it be ok if I do (modified_input*gamma+beta) where modified_input is ((input - running_mean)/standard deviation) instead of ((input-running_mean)/standard deviation)*gamma+beta?

It might seem trivial, but just wanted to confirm once.

Basically I want to express the entire batch normalization operation in terms of just addition and multiplication.

ptrblck · January 27, 2023, 7:55pm

I don’t fully understand the difference between these approaches as both apply the same operation. The first one seems to store an intermediate result in the modified_input tensor.

Saurav_Pawar · January 28, 2023, 11:55am

Ok. Also, can you tell me how can I get better precision?

As you mentioned, it can’t be exact but possible to get better than the current one?

ptrblck · January 28, 2023, 5:19pm

You could use a wider dtype such as float64 for a smaller error.

Saurav_Pawar · January 29, 2023, 6:34am

I tried with float64 but still there is a loss of precision. Above you mentioned some reaosns for the loss but are you aware of some more reasons that contributes to the loss in precision. Also, is this dependent on framework, for pytorch the precision is different and for other frameworks it is different?

ptrblck · January 29, 2023, 6:41am

Using float64 would reduce the error you are seeing, but will not reduce it to zero, as it’s expected to see rounding errors using floating point numbers if a different order of operations is used. Note that none of the two values are representing the theoretically “true” value e.g. in this example:

x = torch.randn(100, 100, 100)
s1 = x.sum()
s2 = x.sum(0).sum(0).sum(0)
print((s1 - s2).abs())
# tensor(0.0002)

as both would suffer from the rounding. This Wikipedia article describes the numerical format of float32 in more detail and explains how floating point numbers are represented.

The error you are seeing depends on the used algorithms, which is defined by the framework, so I would assume to see a different behavior between PyTorch, TF, etc. unless they call into the same library and a deterministic algorithm.