Batchnorm for dynamic length batches


I am using CNN to classify text. The CNN input dimensions are BxM where B is batchsize and M is number of words/features (0 padded to length of max feature size in the batch). M varies from batch to batch. I would like to batchnormalize the input before the nn.Embedding() layer. How can I do this if I do not know size of M apriori when setting the model up?

class Testmodel(nn.Module):
    def __init__(self, args):
        self.args = args

        V = args.embed_num
        D = args.embed_dim
        self.embed = nn.Embedding(V, D)

        self.fc1 = nn.Linear(1000, C)

and the forward

def forward(self, x):
        x = self.embed(x) 
        logit = self.fc1(x)
        return logit

Thank you

It seems to me that batch norm is not applicable to your situation.

  • For the training phase, it could actually work. In practice B and M can change from one batch to another.
  • For the test phase, it wouldn’t be possible. A CNN’s output for a certain example is always independent of other examples. If you predict on image X1 alone, it will give you a result y1. If you predict on a batch [X1,X2], you want the output for X1 to be y1, the same as before; not some value dependent on X2.
    Therefore, during the test phase, the mean and variance vectors used by batch norm are fixed; thus, their size (M) is also fixed.

EDIT: Here I assume some basic understanding of batch norm. See this article I’ve written for more details: You can also ask me questions here if you feel my answer is not so clear.

Hi Carl

I understand the point you make. It makes sense.

Thank you for your help