Hello.

I have a really simple model which uses only `nn.Embedding`

module. The goal is to minimize a specific loss function but with additional contraint that the L2-norm of the embeddings is 1.

I found two options to normalize embeddings, specifically:

- Reassign weight at each
`forward`

call:`self.embeddings.weight.data = F.normalize(self.embeddings.weight.data, p=2, dim=1)`

- Use param
*max_norm*of`nn.Embedding`

:`self.embeddings = nn.Embedding(..., max_norm=1, norm_type=2)`

The problem here is each of this solutions is ignored during backprop, as far as I can see. For the fist option a reassignment of the `weights.data`

just doesn’t allow to track the gradients. The second option implemented using `torch.no_grad()`

context as we can see here.

Actually, I see many examples where people use the first option, but IMO it is not correct (if someone can explain my why it is correct, please, I would be grateful).

The other possible solution is to modify loss function some way, but I don’t really to do this right now…