How to normalize embedding vectors?

Hi, I am using a network to embed some entity into vector space. As the length of the vector decrease during the training. I want to normalize it’s length to 1 in the end of each step. Is there any tool that I can use to normalize the embedding vectors?


I think the best thing you can do is to save the embedded indices, and normalize their rows manually after the update (just index_select them, compute row-wise norm, divice, index_copy back into weights). We only support automatic max norm clipping.


If you want to normalize a vector as a part of a model, this should do it:

assume q is the tensor to be L2 normalized, along dim 1

qn = torch.norm(q, p=2, dim=1).detach()
q = q.div(qn.expand_as(q))

Note the detach(), that is essential for the gradients to work correctly. I’m assuming you want the norm to be treated as a constant while dividing the Tensor with it.


Can I use it to normalise the embedding after each update in the training ?

1 Like

Yes it could be used to normalize an embedding. I suggest not using the detach() though. I found that it degrades performance.

1 Like

I see, thanks. But I need to do the constraints on the norms of embeddings( not bigger than 1 ). Do you have any better suggestions to do it?

why do we have to use detach()? Also in the new PyTorch version, you have to use keepdim=True in the norm() method. A simple implementation of L2 normalization:

# suppose x is a Variable of size [4, 16], 4 is batch_size, 16 is feature dimension
x = Variable(torch.rand(4, 16), requires_grad=True)
norm = x.norm(p=2, dim=1, keepdim=True)
x_normalized = x.div(norm.expand_as(x))

If you use keepdim=True, you don’t even need expand_as(x). The following works for me:

norm = x.norm(p=2, dim=1, keepdim=True)
x_normalized = x.div(norm)

Now PyTorch have a normalize function, so it is easy to do L2 normalization for features. Suppose x is feature vector of size N*D (N is batch size and D is feature dimension), we can simply use the following

import torch.nn.functional as F
x = F.normalize(x, p=2, dim=1)

Yes, it failed quickly with detach()

What if the variable I am trying to normalize is in fact a Parameter from nn.parameter rather than a Variable ?

Then in this case I get the following error for any of the above options I try:

TypeError: cannot assign ‘torch.autograd.variable.Variable’ as parameter ‘W’ (torch.nn.Parameter or None expected)

class myUnit(nn.Module):
    def __init__(self,myParameter):
        super(myUnit, self).__init__()
        self.myParameter = Parameter(myParameter,requires_grad=True)
    def forward(self,input):
        Whatever operation. Just an example:
        self.myParameter = F.normalize(self.myParameter,p=2,dim=1)
        output = self.myParameter * input - 1
        return output

Can I still keep using Parameter in my customized network and be able to also normalize it?

May be you can try: = F.normalize(, p=2, dim=1)
1 Like

@sssohrab I’m facing the exact same issue. Can you please let me how exactly you got through this? Also, did you want the gradients to pass through the norm step?