I want to implement adaptive normalization as suggested in the paper Fast Image Processing with Fully- Convolutional networks. The normalization is defined as ax + bBN(x) where a and b are learnable scalar parameters and BN is the 2d batch normalization operator. This normalizer needs to be invoked during training after every leaky_relu activated 2d convolution layer. How do I go about coding this normalizer?

1 Like

Something like the following should work. There might be some errors as I didnâ€™t look up the exact API, but those should be easy to fix.

class ABN2d(nn.Module):
def __init__(self):
super().__init__()
self.bn = nn.BatchNorm2d(affine = False)
self.register_parameter('a', Variable(torch.*Tensor(1, 1, 1, 1), requires_grad = True))
self.register_parameter('b', Variable(torch.*Tensor(1, 1, 1, 1), requires_grad = True))

def forward(self, x):
return a * x + b * self.bn(x)


I was implementing it myself and compared my approach to yours.

Here is my module:

class AdaptiveBatchNorm2d(nn.Module):
def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True):
self.bn = nn.BatchNorm2d(num_features, eps, momentum, affine)
self.a = nn.Parameter(torch.FloatTensor(1, 1, 1, 1))
self.b = nn.Parameter(torch.FloatTensor(1, 1, 1, 1))

def forward(self, x):
return self.a * x + self.b * self.bn(x)


Could you explain the difference between your approach registering the Variables as parameters and using nn.Parameter.
I also donâ€™t know, why you set affine to False.
The original BN implementation introduced two parameters (gamma and beta)
(Btw. is there any option to write greek letters? \beta seems not to work)

As I understand the new formulation would be:

a * x + b * gamma * x_norm + b * beta


Thus using affine=False, you would lose the b * beta term or am I missing something?

1 Like

nn.Parameter and register_parameter should be equivalent. Turns out that I should use Parameter instead of Variable in above code.

I set affine to false because b, beta and gamma will be all trainable. So the formulation is equivalent but with less params except for gradient values.

But the pytorch doc says if affine = False, gamma and beta are not learnt during training?

If affine is false, there wonâ€™t be any gamma or beta.

I just realized that if affine = True, then it is possible to learn of an bias offset. If you also want that, you may either use affine = True at the cost of unnecessary parameter, or add another parameter.

what exact paper are you referring to? is it this one Adaptive Normalization: A novel data normalization approach for non-stationary time series | IEEE Conference Publication | IEEE Xplore

This is the paper title: â€śFast Image Processing with Fully Convolution Networksâ€ť https://arxiv.org/pdf/1709.00643.pdf