Normalizing Embeddings

Mehran · September 22, 2017, 12:58am

I’m trying to manually normalize my embeddings with their L2-norms instead of using pytorch max_norm (as max_norm seems to have some bugs). I’m following this link and below is my code:

emb = torch.nn.Embedding(4, 2)
norms = torch.norm(emb.weight, p=2, dim=1).detach()
emb.weight = emb.weight.div(norms.expand_as(emb.weight))

But I’m getting the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/torch/autograd/variable.py", line 725, in expand_as
return Expand.apply(self, (tensor.size(),))
File "/usr/local/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 111, in forward
result = i.expand(*new_size)
RuntimeError: The expanded size of the tensor (2) must match the existing size (4) at non-singleton dimension 1. at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:308

When I look at the size of norms, it’s (4L,)
Any idea where I’m going wrong? Thanks!

chenyuntc · September 22, 2017, 2:32am

shouldn’t it be?

emb.weight = emb.weight.div(norms.expand_as(emb.weight))

Mehran · September 22, 2017, 3:48am

Yes, that was a typo. I edited the code, and also edited the error. Any idea why I’m getting that error?

chenyuntc · September 22, 2017, 4:09am

emb = torch.nn.Embedding(4, 2)
norms = torch.norm(emb.weight, p=2, dim=1).data
emb.weight.data = emb.weight.data.div(norms.view(4,1).expand_as(emb.weight))

Mehran · September 24, 2017, 9:09pm

Thanks Chen. It worked.
I realized that you have removed the “detach()” from the second line. Is it because “norms” is not a “Variable” anymore?

chenyuntc · September 25, 2017, 2:31am

Yes, you’re right.
I prefer variable.data.

jdhao · November 1, 2017, 6:08am

What is the use of detach(), I do not think it is necessary here. I think we can also use

emb = torch.nn.Embedding(4,2)
norm = emb.weight.norm(p=2, dim=1, keepdim=True)
emb.weight = emb.weight.div(norm.expand_as(emb.weight))

Is there any problems with the above snippet?

Mehran · November 8, 2017, 1:15am

My understanding is that if we don’t detach it, then norm will be a variable and PyTorch will aim at optimizing its values in the backward phase when calculating the gradients. But we don’t really want to optimize the norm values.

jdhao · November 8, 2017, 2:22am

The type of norm is torch Variable. PyTorch will only calculate the the gradient of loss w.r.t to the leaf node. Since norm is not a leaf node, I do think it will be updated when we do optimizer.step(). Only emb.weight will be updated since it is of type torch.nn.Parameter and it is the learnable parameter of the module.

falmasri · March 19, 2019, 10:05pm

@jdhao I was seriously looking over the web for this kind of answer, Sorry of taking this discussion again up. Could you please explain more why norm is not a leaf node in the gradient computation ?

Arvind_Vepa · May 2, 2019, 12:09am

So this doesn’t precisely replicate the max-norm functionality because it’s not checking if the norm of the vector is less than max-norm, right (in which case, this function would increase the norm of those vectors)?