Normalize embeddings using nn.BatchNorm1d(embedding_size)

vainaijr · November 4, 2019, 7:37am

is this a correct way to normalize embeddings with learnable parameters?

x = nn.Embedding(10, 100)
y = nn.BatchNorm1d(100)
a = torch.LongTensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y(x(a))

fly2mars · November 29, 2019, 4:12am

Looks like your input batch size is 10 (from a), which happens to equal your vocabulary size in x.

Your code looks fine if the above assumption is right.

Just to remind that if your input contains multiple input, say
a = torch.LongTensor([[0,1,2],[0,4,6]])
this means you’ve got a batch of size 2 and each sample has 3 features.

Then after embedding, you’ll get tensor of size (2, 3, 100),
From my point of view, a reshape(-1, 3*100) or view(-1, 3*100) is needed in order to apply BatchNorm1d

Pls correct me if Im wrong.

vainaijr · November 29, 2019, 4:48am

I think better way is to set max_norm in nn.Embedding, as compared to using nn.BatchNorm1d, it re normalizes each embedding vector norm to be less than or equal to max_norm.