I am trying to train an NLP model that takes in the entire sequence at once instead of passing each time step individually as this approach is faster afaik. So, my data is of shape (seq_size, batch_size, length)
. This format is working fine for all other layers but I am facing problem with BatchNorm1d
. So, normally when I am using a non-sequential data of shape (batch_size, length)
, I do the following:
self.linear = nn.Linear(length1, length2)
self.bn = nn.BatchNormalization1d(length2)
and then in the forward
function:
x = self.linear(x)
x = self.bn(x)
and this works fine. But when I use the same method for sequential, it takes the second dimension, i.e. the batch_size and then gives a RuntimeError
. eg, my batch size is 32, and the length of output from the previous Linear
layer is 64, so I have defined my batch norm as nn.BatchNorm1d(64)
. but this gives me error RuntimeError: running_mean should contain 32 elements not 64
on forward pass. How do I fix this? what is the correct method for this kind of problem?