How to calculate BatchNorm2d manually in Python?

Hello Everyone

I want to calculate the batchnorm2d manually as I have to use the trained weights and inputs to calculate in C++ for my research. So I was trying to recreate each layer output in Python first.

I have the output of the convolutional layer of size [1, 32, 22, 72] and I want to perform batch normalization mentioned in the link (BatchNorm2d — PyTorch 2.1 documentation). I also have the weights stored as numpy arrays for gamma (weights), beta (biases), running mean, and running variance obtained from the trained batchnorm2d layer. The shape of each of the parameters above is as follows:
gamma (weights) = [32, 1, 8, 8]
beta (biases) = [32]
running mean = [32]
running variance = [32]

Can someone help me with a Python code to perform batch norm from scratch? I am getting shape or broadcast error when I try to do this. Also, can someone explain why we get the gamma value of the shape [32, 1, 8, 8] and why not [32]? Is it related to the kernel size of the conv layer? Because my kernel size was (8, 8) with 32 filters.

This reference might be helpful.