 # Batch Normalization of Linear Layers

Is it possible to perform batch normalization in a network that is only linear layers?

For example:

``````class network(nn.Module):
def __init__(self):
super(network, self).__init__()
self.linear1 = nn.Linear(in_features=40, out_features=320)
self.linear2 = nn.Linear(in_features=320, out_features=2)

def forward(input):  # Input is a 1D tensor
y = F.relu(self.linear1(input))
# Would it be possible to do a batch normalization of y overhere? If so how?
y = F.softmax(self.linear2(input))
return y``````
3 Likes

Sure! You could just use `nn.BatchNorm1d`.
There are some minor issues in your code, so here is a working example:

``````class network(nn.Module):
def __init__(self):
super(network, self).__init__()
self.linear1 = nn.Linear(in_features=40, out_features=320)
self.bn1 = nn.BatchNorm1d(num_features=320)
self.linear2 = nn.Linear(in_features=320, out_features=2)

def forward(self, input):  # Input is a 1D tensor
y = F.relu(self.bn1(self.linear1(input)))
y = F.softmax(self.linear2(y), dim=1)
return y

model = network()
x = torch.randn(10, 40)
output = model(x)
``````

You can also put the `BatchNorm` after the `relu`, if you like.

12 Likes

@ptrblck I tried that but I received “ValueError: expected a 2D or 3D input (got 1D input).”

Are you sure you are passing your input as `[batch_dim, num_features]`?
The error sounds like you’ve passed just `[num_features]` to your model.

When I do that I get a different error.: “ValueError: Expected more than 1 value per channel when training, got input size [1, 320].” This is for Q network, so it only receives one state at a time, hence the batch size of 1.

Then `nn.BatchNorm` probably won’t work very well.
Have a look at the normalization layers. Maybe `LayerNorm` or another one will fit your needs.

Is it the same effect that put the `BatchNorm` before or after the `ReLU`?  You will most likely see a different performance depending on where you place the batchnorm layer, since the input activation will have a different distribution.

So…where should I place the `BatchNorm` layer, to train a great performance model?
(Not only linear layers model, but like CNN or RNN)  1. Between each layer? 2. Just before or after the activation function layer? 3. Should before or after the activation function layer? And where I shouldn’t place the `BatchNorm` layer?

@shirui-japina In general, Batch Norm layer is usually added before ReLU(as mentioned in the Batch Normalization paper). But there is no real standard being followed as to where to add a Batch Norm layer. You can experiment with different settings and you may find different performances for each setting.

As far as I know, generally you will find batch norm as part of the feature extraction branch of a network and not in its classification branch(`nn.Linear`).

2 Likes

Thanks for your reply. So the place of `BatchNorm` layer in CNN is like this:
CNN(
convolution-layer-1,
batch-norm-layer-1,
activate-layer(ReLU),

convolution-layer-2,
batch-norm-layer-2,
activate-layer(ReLU),

fully-connection-layer,
)

Should we place `BatchNorm` layer before the pooling layer?  1 Like

If you ask me, I would place it after the pooling layer. But you can check out how vision models are implemented in pytorch to get clarity.

2 Likes

Got it, thanks for your help.  1 Like

Hi Ptrblck

Sorry to take your time. I have a question, I normalized my patch before training, and my ANN is 2CNN layer with 2 fully connected layer. Is it necessary to do batch normalization or since the layers are not very deep it is not necessary?

Oh, my best advice is to try out both approaches and compare the validation accuracy with and without batchnorm layers.
I don’t have a specific advice on when to use them with respect to the number of layers. Let us know, which model worked better! PS: Also, compare the training and validation accuracy to pick the right model, not the test accuracy, as you would leak the test data information into your model selection process.

2 Likes

You most likeley will not see a drastic change in the network performance (get higher acc,etc). however, batchnorml incur around 30% overhead to your network runtime. it will affect your training as well as inference unless at inference you fuse them.
All in all BatchNorm shines when you have a very deep architecture, what you have there is not really considered deep that much.
You may very well update us with the result you get though
Cheers.