Hi, I am using a network to embed some entity into vector space. As the length of the vector decrease during the training. I want to normalize it’s length to 1 in the end of each step. Is there any tool that I can use to normalize the embedding vectors?
I think the best thing you can do is to save the embedded indices, and normalize their rows manually after the update (just index_select them, compute row-wise norm, divice, index_copy back into weights). We only support automatic max norm clipping.
If you want to normalize a vector as a part of a model, this should do it:
assume q is the tensor to be L2 normalized, along dim 1
qn = torch.norm(q, p=2, dim=1).detach()
q = q.div(qn.expand_as(q))
detach(), that is essential for the gradients to work correctly. I’m assuming you want the norm to be treated as a constant while dividing the Tensor with it.
Can I use it to normalise the embedding after each update in the training ?
Yes it could be used to normalize an embedding. I suggest not using the
detach() though. I found that it degrades performance.
I see, thanks. But I need to do the constraints on the norms of embeddings( not bigger than 1 ). Do you have any better suggestions to do it?
why do we have to use
detach()? Also in the new PyTorch version, you have to use
keepdim=True in the
norm() method. A simple implementation of L2 normalization:
# suppose x is a Variable of size [4, 16], 4 is batch_size, 16 is feature dimension x = Variable(torch.rand(4, 16), requires_grad=True) norm = x.norm(p=2, dim=1, keepdim=True) x_normalized = x.div(norm.expand_as(x))
If you use
keepdim=True, you don’t even need
expand_as(x). The following works for me:
norm = x.norm(p=2, dim=1, keepdim=True) x_normalized = x.div(norm)
Now PyTorch have a normalize function, so it is easy to do L2 normalization for features. Suppose
x is feature vector of size
N is batch size and
D is feature dimension), we can simply use the following
import torch.nn.functional as F x = F.normalize(x, p=2, dim=1)
Yes, it failed quickly with
What if the variable I am trying to normalize is in fact a
nn.parameter rather than a
Then in this case I get the following error for any of the above options I try:
TypeError: cannot assign ‘torch.autograd.variable.Variable’ as parameter ‘W’ (torch.nn.Parameter or None expected)
class myUnit(nn.Module): def __init__(self,myParameter): super(myUnit, self).__init__() self.myParameter = Parameter(myParameter,requires_grad=True) def forward(self,input): """ Whatever operation. Just an example: """ self.myParameter = F.normalize(self.myParameter,p=2,dim=1) output = self.myParameter * input - 1 return output
Can I still keep using
Parameter in my customized network and be able to also normalize it?
May be you can try:
self.myParameter.weight.data = F.normalize(self.myParameter.weight.data, p=2, dim=1)
@sssohrab I’m facing the exact same issue. Can you please let me how exactly you got through this? Also, did you want the gradients to pass through the norm step?