is this a correct way to normalize embeddings with learnable parameters?
x = nn.Embedding(10, 100)
y = nn.BatchNorm1d(100)
a = torch.LongTensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y(x(a))
is this a correct way to normalize embeddings with learnable parameters?
x = nn.Embedding(10, 100)
y = nn.BatchNorm1d(100)
a = torch.LongTensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y(x(a))
Looks like your input batch size is 10 (from a
), which happens to equal your vocabulary size in x
.
Your code looks fine if the above assumption is right.
Just to remind that if your input contains multiple input, say
a = torch.LongTensor([[0,1,2],[0,4,6]])
this means you’ve got a batch of size 2 and each sample has 3 features.
Then after embedding, you’ll get tensor of size (2, 3, 100)
,
From my point of view, a reshape(-1, 3*100)
or view(-1, 3*100)
is needed in order to apply BatchNorm1d
Pls correct me if Im wrong.
I think better way is to set max_norm in nn.Embedding, as compared to using nn.BatchNorm1d, it re normalizes each embedding vector norm to be less than or equal to max_norm.