Hi, I am using a network to embed some entity into vector space. As the length of the vector decrease during the training. I want to normalize it’s length to 1 in the end of each step. Is there any tool that I can use to normalize the embedding vectors?
How to normalize embedding vectors?
I think the best thing you can do is to save the embedded indices, and normalize their rows manually after the update (just index_select them, compute row-wise norm, divice, index_copy back into weights). We only support automatic max norm clipping.
If you want to normalize a vector as a part of a model, this should do it:
assume q is the tensor to be L2 normalized, along dim 1
qn = torch.norm(q, p=2, dim=1).detach()
q = q.div(qn.expand_as(q))
Note the detach()
, that is essential for the gradients to work correctly. I’m assuming you want the norm to be treated as a constant while dividing the Tensor with it.
Yes it could be used to normalize an embedding. I suggest not using the detach()
though. I found that it degrades performance.
I see, thanks. But I need to do the constraints on the norms of embeddings( not bigger than 1 ). Do you have any better suggestions to do it?
why do we have to use detach()
? Also in the new PyTorch version, you have to use keepdim=True
in the norm()
method. A simple implementation of L2 normalization:
# suppose x is a Variable of size [4, 16], 4 is batch_size, 16 is feature dimension
x = Variable(torch.rand(4, 16), requires_grad=True)
norm = x.norm(p=2, dim=1, keepdim=True)
x_normalized = x.div(norm.expand_as(x))
If you use keepdim=True
, you don’t even need expand_as(x)
. The following works for me:
norm = x.norm(p=2, dim=1, keepdim=True)
x_normalized = x.div(norm)
Now PyTorch have a normalize function, so it is easy to do L2 normalization for features. Suppose x
is feature vector of size N*D
(N
is batch size and D
is feature dimension), we can simply use the following
import torch.nn.functional as F
x = F.normalize(x, p=2, dim=1)
What if the variable I am trying to normalize is in fact a Parameter
from nn.parameter
rather than a Variable
?
Then in this case I get the following error for any of the above options I try:
TypeError: cannot assign ‘torch.autograd.variable.Variable’ as parameter ‘W’ (torch.nn.Parameter or None expected)
class myUnit(nn.Module):
def __init__(self,myParameter):
super(myUnit, self).__init__()
self.myParameter = Parameter(myParameter,requires_grad=True)
def forward(self,input):
"""
Whatever operation. Just an example:
"""
self.myParameter = F.normalize(self.myParameter,p=2,dim=1)
output = self.myParameter * input - 1
return output
Can I still keep using Parameter
in my customized network and be able to also normalize it?
May be you can try:
self.myParameter.weight.data = F.normalize(self.myParameter.weight.data, p=2, dim=1)