I have sequence data going in for RNN type architecture with batch first i.e. my input data to the model will be of dimension 64x256x16 (64 is the batch size, 256 is the sequence length and 16 features) and coming output is 64x256x1024 (again 64 is the batch size, 256 is the sequence length and 1024 features). Now, if I want to apply batch normalization should it not be on output features which is 1024 but the problem is pytorch does not allow this or I do not completely understand. From this doc it is clear from the following statement

`torch.nn.BatchNorm1d`:
Parameters: num_features – C from an expected input of size (N,C,L) or
L from input of size (N,L)

it is clear for 2D data that batch-normalization is executed on L for input size(N, L) as N is incoming features to the layer and L is outgoing features but it is confusing for 3D data which I believe should also be L.

Please someone who has used batch-normalization for 3D data.

@ptrblck thanks a lot. I feel so stupid for not thinking that way.

Since we can always permute the data to fit according to our need but still I would like to know what is the idea for this default setting in case od 3D data shape to normalize on the middle dimension.

Hello, I just came across this topic because I’m actually trying to do batch normalization for multivariate time series data, and I did it on the features following exactly the same method you described : bn = nn.BatchNorm1d(1024) x = torch.randn(64, 256, 1024) x = x.permute(0, 2, 1) output = bn(x)

However I applied this in the forward method of the LSTM class I created, and therefore I give it as an input to the LSTM, the results are just perfect for me, but I still have one question :

Since Batch normalization applies to each layer in the LSTM I have the feeling it is not the case following what I just did, because I just add a few line in the forward method of the LSTM, and I don’t know it really applies to each layer or just the input, because it looks like I only apply it to the input.
Here’s the part I added in the forward method :

In your code snippet the batchnorm layer will be applied to features only before they are passed to self.lstm.
What is your use case? Would you like to apply batchnorm layers after each layer in a multi-layer LSTM?

In my case I am working with multivariate time series data, and just like you said, I want to use batchnorm layers after each layer in the multi-layer LSTM.
I’ve did some research to find how to do it, but it says it’s not possible with the normal LSTM.
The only way to do it is to modify the LSTM so that a recurrent batchnorm (Article) can be applied, the implementation of the modified LSTM has been given in this Github repo.
Do you confirm what I’ve just said or there’s a simpler way to do it ?
Thank you !

I think you would have to apply the batchnorm layer on each input separately or pad the input sequences to the same length. PackedSquence would contain inputs with different lengths, so the batchnorm layer wouldn’t be able to process this input together.

Hmm… Following this discussion, I think the correct way of applying BatchNorm is this (please correct me @ptrblck if I’m wrong):

def simple_elementwise_apply(fn, packed_sequence):
"""applies a pointwise function fn to each element in packed_sequence"""
return torch.nn.utils.rnn.PackedSequence(fn(packed_sequence.data), packed_sequence.batch_sizes)
# Assuming 'input' is a PackedSquence and bn = BatchNorm1d(....)
output, _ = lstm(input)
output = simple_elementwise_apply(bn, output)