How to normalize embedding vectors?

maplewizard · March 20, 2017, 3:26pm

Hi, I am using a network to embed some entity into vector space. As the length of the vector decrease during the training. I want to normalize it’s length to 1 in the end of each step. Is there any tool that I can use to normalize the embedding vectors?

apaszke · March 21, 2017, 2:06pm

I think the best thing you can do is to save the embedded indices, and normalize their rows manually after the update (just index_select them, compute row-wise norm, divice, index_copy back into weights). We only support automatic max norm clipping.

samarth-robo · June 18, 2017, 4:33am

If you want to normalize a vector as a part of a model, this should do it:

assume q is the tensor to be L2 normalized, along dim 1

qn = torch.norm(q, p=2, dim=1).detach()
q = q.div(qn.expand_as(q))

Note the detach(), that is essential for the gradients to work correctly. I’m assuming you want the norm to be treated as a constant while dividing the Tensor with it.

asdass · September 7, 2017, 1:26pm

Can I use it to normalise the embedding after each update in the training ?

samarth-robo · September 7, 2017, 1:48pm

Yes it could be used to normalize an embedding. I suggest not using the detach() though. I found that it degrades performance.

asdass · September 7, 2017, 1:54pm

I see, thanks. But I need to do the constraints on the norms of embeddings( not bigger than 1 ). Do you have any better suggestions to do it?

jdhao · November 1, 2017, 5:58am

why do we have to use detach()? Also in the new PyTorch version, you have to use keepdim=True in the norm() method. A simple implementation of L2 normalization:

# suppose x is a Variable of size [4, 16], 4 is batch_size, 16 is feature dimension
x = Variable(torch.rand(4, 16), requires_grad=True)
norm = x.norm(p=2, dim=1, keepdim=True)
x_normalized = x.div(norm.expand_as(x))

moi90 · November 22, 2017, 11:47am

If you use keepdim=True, you don’t even need expand_as(x). The following works for me:

norm = x.norm(p=2, dim=1, keepdim=True)
x_normalized = x.div(norm)

jdhao · November 23, 2017, 6:23am

Now PyTorch have a normalize function, so it is easy to do L2 normalization for features. Suppose x is feature vector of size N*D (N is batch size and D is feature dimension), we can simply use the following

import torch.nn.functional as F
x = F.normalize(x, p=2, dim=1)

Liang · December 30, 2017, 12:08pm

Yes, it failed quickly with detach()

sssohrab · February 12, 2018, 11:20pm

What if the variable I am trying to normalize is in fact a Parameter from nn.parameter rather than a Variable ?

Then in this case I get the following error for any of the above options I try:

TypeError: cannot assign ‘torch.autograd.variable.Variable’ as parameter ‘W’ (torch.nn.Parameter or None expected)

class myUnit(nn.Module):
    def __init__(self,myParameter):
        super(myUnit, self).__init__()
        self.myParameter = Parameter(myParameter,requires_grad=True)
    def forward(self,input):
        """
        Whatever operation. Just an example:
        """
        self.myParameter = F.normalize(self.myParameter,p=2,dim=1)
        output = self.myParameter * input - 1
        return output

Can I still keep using Parameter in my customized network and be able to also normalize it?

Guohai93 · March 26, 2018, 8:27am

May be you can try:

self.myParameter.weight.data = F.normalize(self.myParameter.weight.data, p=2, dim=1)

Ritwick_Chaudhry · April 5, 2019, 6:11am

@sssohrab I’m facing the exact same issue. Can you please let me how exactly you got through this? Also, did you want the gradients to pass through the norm step?

f10w · February 15, 2023, 3:54pm

@ptrblck Could you please help answer this question from @sssohrab? I am facing the same issue (“RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss.”)
Thank you so much in advance for your help!

ptrblck · February 16, 2023, 2:52am

Your issue seems to be unrelated to the linked post, as your error points towards an error running a DDP model.

f10w · February 16, 2023, 9:52am

Sorry I copied the wrong error message. The above message was for the following code:

self.embeddings = torch.nn.Parameter(F.normalize(self.embeddings, p=2, dim=-1))

What I would like to do is to constraint the weights of my model to always lie on a sphere. And I finally found the solution:

self.embeddings.data.copy_(torch.nn.Parameter(F.normalize(self.embeddings.data, p=2, dim=-1)))

Hope that this will be useful for future readers.