is this a correct way to normalize embeddings with learnable parameters?

```
x = nn.Embedding(10, 100)
y = nn.BatchNorm1d(100)
a = torch.LongTensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y(x(a))
```

is this a correct way to normalize embeddings with learnable parameters?

```
x = nn.Embedding(10, 100)
y = nn.BatchNorm1d(100)
a = torch.LongTensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y(x(a))
```

Looks like your input batch size is 10 (from `a`

), which happens to equal your vocabulary size in `x`

.

Your code looks fine if the above assumption is right.

Just to remind that if your input contains multiple input, say

`a = torch.LongTensor([[0,1,2],[0,4,6]])`

this means you’ve got a batch of size 2 and each sample has 3 features.

Then after embedding, you’ll get tensor of size `(2, 3, 100)`

,

From my point of view, a `reshape(-1, 3*100)`

or `view(-1, 3*100)`

is needed in order to apply `BatchNorm1d`

Pls correct me if Im wrong.

I think better way is to set max_norm in nn.Embedding, as compared to using nn.BatchNorm1d, it re normalizes each embedding vector norm to be less than or equal to max_norm.