# Normalizing Embeddings

I’m trying to manually normalize my embeddings with their L2-norms instead of using pytorch max_norm (as max_norm seems to have some bugs). I’m following this link and below is my code:

``````emb = torch.nn.Embedding(4, 2)
norms = torch.norm(emb.weight, p=2, dim=1).detach()
emb.weight = emb.weight.div(norms.expand_as(emb.weight))
``````

But I’m getting the following error:

``````Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/torch/autograd/variable.py", line 725, in expand_as
return Expand.apply(self, (tensor.size(),))
File "/usr/local/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 111, in forward
result = i.expand(*new_size)
RuntimeError: The expanded size of the tensor (2) must match the existing size (4) at non-singleton dimension 1. at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:308
``````

When I look at the size of norms, it’s (4L,)
Any idea where I’m going wrong? Thanks!

shouldn’t it be?

``````emb.weight = emb.weight.div(norms.expand_as(emb.weight))
``````

Yes, that was a typo. I edited the code, and also edited the error. Any idea why I’m getting that error?

``````emb = torch.nn.Embedding(4, 2)
norms = torch.norm(emb.weight, p=2, dim=1).data
emb.weight.data = emb.weight.data.div(norms.view(4,1).expand_as(emb.weight))
``````
Thanks Chen. It worked.
I realized that you have removed the “detach()” from the second line. Is it because “norms” is not a “Variable” anymore?

Yes, you’re right.
I prefer `variable.data`.

What is the use of `detach()`, I do not think it is necessary here. I think we can also use

``````emb = torch.nn.Embedding(4,2)
norm = emb.weight.norm(p=2, dim=1, keepdim=True)
emb.weight = emb.weight.div(norm.expand_as(emb.weight))
``````

Is there any problems with the above snippet?

My understanding is that if we don’t detach it, then norm will be a variable and PyTorch will aim at optimizing its values in the backward phase when calculating the gradients. But we don’t really want to optimize the norm values.

The type of `norm` is torch `Variable`. PyTorch will only calculate the the gradient of loss w.r.t to the leaf node. Since norm is not a leaf node, I do think it will be updated when we do `optimizer.step()`. Only `emb.weight` will be updated since it is of type `torch.nn.Parameter` and it is the learnable parameter of the module.

@jdhao I was seriously looking over the web for this kind of answer, Sorry of taking this discussion again up. Could you please explain more why norm is not a leaf node in the gradient computation ?

So this doesn’t precisely replicate the max-norm functionality because it’s not checking if the norm of the vector is less than max-norm, right (in which case, this function would increase the norm of those vectors)?