Batchnorm for dynamic length batches

montyhall · August 31, 2017, 4:03pm

Hi

I am using CNN to classify text. The CNN input dimensions are BxM where B is batchsize and M is number of words/features (0 padded to length of max feature size in the batch). M varies from batch to batch. I would like to batchnormalize the input before the nn.Embedding() layer. How can I do this if I do not know size of M apriori when setting the model up?

class Testmodel(nn.Module):
    def __init__(self, args):
        super(Testmodel,self).__init__()
        self.args = args

        V = args.embed_num
        D = args.embed_dim
        
        self.embed = nn.Embedding(V, D)
        ......

        self.fc1 = nn.Linear(1000, C)

and the forward

def forward(self, x):
        x = self.embed(x) 
        ....
        logit = self.fc1(x)
        return logit

Thank you

Carl · August 31, 2017, 5:33pm

It seems to me that batch norm is not applicable to your situation.

For the training phase, it could actually work. In practice B and M can change from one batch to another.
For the test phase, it wouldn’t be possible. A CNN’s output for a certain example is always independent of other examples. If you predict on image X1 alone, it will give you a result y1. If you predict on a batch [X1,X2], you want the output for X1 to be y1, the same as before; not some value dependent on X2.
Therefore, during the test phase, the mean and variance vectors used by batch norm are fixed; thus, their size (M) is also fixed.

EDIT: Here I assume some basic understanding of batch norm. See this article I’ve written for more details: https://vitalab.github.io/deep-learning/2017/02/09/batch-norm.html. You can also ask me questions here if you feel my answer is not so clear.

montyhall · September 1, 2017, 1:45pm

Hi Carl

I understand the point you make. It makes sense.

Thank you for your help