Adaptive Normalization

bala · October 27, 2017, 5:27pm

I want to implement adaptive normalization as suggested in the paper Fast Image Processing with Fully- Convolutional networks. The normalization is defined as ax + bBN(x) where a and b are learnable scalar parameters and BN is the 2d batch normalization operator. This normalizer needs to be invoked during training after every leaky_relu activated 2d convolution layer. How do I go about coding this normalizer?

SimonW · October 27, 2017, 6:46pm

Something like the following should work. There might be some errors as I didn’t look up the exact API, but those should be easy to fix.

class ABN2d(nn.Module):
  def __init__(self):
    super().__init__()
    self.bn = nn.BatchNorm2d(affine = False)
    self.register_parameter('a', Variable(torch.*Tensor(1, 1, 1, 1), requires_grad = True))
    self.register_parameter('b', Variable(torch.*Tensor(1, 1, 1, 1), requires_grad = True))

  def forward(self, x):
    return a * x + b * self.bn(x)

ptrblck · October 27, 2017, 7:20pm

I was implementing it myself and compared my approach to yours.

Here is my module:

class AdaptiveBatchNorm2d(nn.Module):
    def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True):
        super(AdaptiveBatchNorm2d, self).__init__()
        self.bn = nn.BatchNorm2d(num_features, eps, momentum, affine)
        self.a = nn.Parameter(torch.FloatTensor(1, 1, 1, 1))
        self.b = nn.Parameter(torch.FloatTensor(1, 1, 1, 1))

    def forward(self, x):
        return self.a * x + self.b * self.bn(x)

Could you explain the difference between your approach registering the Variables as parameters and using nn.Parameter.
I also don’t know, why you set affine to False.
The original BN implementation introduced two parameters (gamma and beta)
(Btw. is there any option to write greek letters? \beta seems not to work)

As I understand the new formulation would be:

a * x + b * gamma * x_norm + b * beta

Thus using affine=False, you would lose the b * beta term or am I missing something?

SimonW · October 27, 2017, 7:34pm

nn.Parameter and register_parameter should be equivalent. Turns out that I should use Parameter instead of Variable in above code.

I set affine to false because b, beta and gamma will be all trainable. So the formulation is equivalent but with less params except for gradient values.

bala · October 28, 2017, 5:51am

But the pytorch doc says if affine = False, gamma and beta are not learnt during training?

SimonW · October 28, 2017, 8:01pm

If affine is false, there won’t be any gamma or beta.

I just realized that if affine = True, then it is possible to learn of an bias offset. If you also want that, you may either use affine = True at the cost of unnecessary parameter, or add another parameter.

Brando_Miranda · July 30, 2021, 7:16pm

what exact paper are you referring to? is it this one Adaptive Normalization: A novel data normalization approach for non-stationary time series | IEEE Conference Publication | IEEE Xplore

vinaykumar2491 · November 26, 2021, 8:44pm

This is the paper title: “Fast Image Processing with Fully Convolution Networks” https://arxiv.org/pdf/1709.00643.pdf