As far as I understand the documentation for BatchNorm1d layer we provide number of features as argument to constructor(nn.BatchNorm1d(number of features)).
As an input the layer takes (N, C, L), where N is batch size (I guess…), C is the number of features (this is the dimension where normalization is computed), and L is the input size.
Let’s assume I have input in following shape: (batch_size, number_of_timesteps, number_of_features)
which is usual data shape for time series if batch_first=True.
Question
Should I transpose the input (swap dimension 1 and 2) before running the batch normalization?
In this case I will have to transpose the output again to use it in RNN later. It looks quite weird to me.
Can someone please take a look at below example and let me know if this is the proper way.
E.g.:
import torch
from torch import nn
# data (batch size, number of time steps, number of features)
x = torch.rand(3, 4, 5)
# layers
bn = nn.BatchNorm1d(5)
rnn = nn.RNN(5, 10, 1, batch_first=True)
# computation - transpose TWICE
x_normalized = bn(x.transpose(1, 2)).transpose(1, 2)
rnn(x_normalized)
Your code looks correct, since the batchnorm layer expects an input in [batch_size, features, temp. dim] so you would need to permute it before (and after to match the input of the rnn).
In your code snippet you could of course initialize the tensor in the right shape, but I assume that code is just to show the usage.
Isn’t this a fundamentally flawed approach? BatchNorm1d is not working harmoniously with nn.Linear, which is the most fundamental part of PyTorch. This makes it not possible to use BatchNorm and Linear layer in the sequential format. At least to my knowledge.
For 2D Batches it works great. However, when we use fully connected layers for 3D inputs of shape (N=batch size,L=sequence length,C=input size) we have to transpose 2 times to use BatchNorm1D after each linear transformation. Because each linear layer acts upon dim=-1 (our features) and BatchNorm can act only on dim=1, 2 transpositions must be done: before and after the BatchNorm. For example, a fully connected network of 3 layers is given below with BatchNorm before each non-linearity.
class FC_layer(nn.Module):
def __init__(self,input_size_FC1,output_size_FC1,output_size_FC2,output_size_FC3):
super(FC_layer, self).__init__()
self.linear_layer1 = nn.Linear(input_size_FC1,output_size_FC1)
self.normalization1 = nn.BatchNorm1d(output_size_FC1)
self.linear_layer2 = nn.Linear(output_size_FC1,output_size_FC2)
self.normalization2 = nn.BatchNorm1d(output_size_FC2)
self.linear_layer3 = nn.Linear(output_size_FC2,output_size_FC3)
self.normalization3 = nn.BatchNorm1d(output_size_FC3)
def forward(self,x):
"""
x : tensor of shape (N_batch, N_sequence, N_features)
"""
f = nn.ReLU() # Activation functions
g = nn.LogSoftmax(dim=2)
x_lin1 = self.linear_layer1(x) # Apply linear transformation
x_lin1 = torch.transpose(x_lin1,1,2) # Transpose for BatchNorm1d
x_lin1_norm = self.normalization1(x_lin1) # Normalize
layer1_out = f(x_lin1_norm) # Apply non-linearity
layer2_in = torch.transpose(layer1_out,1,2) # Transpose for next linear transformation
x_lin2 = self.linear_layer2(layer2_in)
x_lin2 = torch.transpose(x_lin2,1,2)
x_lin2_norm = self.normalization2(x_lin2)
layer2_out = f(x_lin2_norm)
layer3_in = torch.transpose(layer2_out,1,2) # Transpose for next linear transformation
x_lin3 = self.linear_layer3(layer3_in)
x_lin3 = torch.transpose(x_lin3,1,2)
x_lin3_norm = self.normalization3(x_lin3)
layer3_out = g(x_lin3_norm)
output = torch.transpose(layer3_out,1,2)
return output
Instead, if BatchNorm1d could act upon the final dimension, we wouldn’t need to worry about the dimensions conversions.
I couldn’t find a way to do this in the sequential container, but you are suggesting that I could write a permutation module with no parametes and use this in the sequential container. Right? I am worried a about the necessity to do a copy() or replace the tensor with its transposed version. Also about the contiguous requirment of other pytorch functionalities.
Is there currently any way to avoid the permutation before and after the BatchNorm1d in a sequential container? I just need to apply that before an LSTM layer and I am not able to do the permutation suggested