I’m trying to manually normalize my embeddings with their L2-norms instead of using pytorch max_norm (as max_norm seems to have some bugs). I’m following this link and below is my code:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/torch/autograd/variable.py", line 725, in expand_as
return Expand.apply(self, (tensor.size(),))
File "/usr/local/lib/python2.7/site-packages/torch/autograd/_functions/tensor.py", line 111, in forward
result = i.expand(*new_size)
RuntimeError: The expanded size of the tensor (2) must match the existing size (4) at non-singleton dimension 1. at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:308
When I look at the size of norms, it’s (4L,)
Any idea where I’m going wrong? Thanks!
My understanding is that if we don’t detach it, then norm will be a variable and PyTorch will aim at optimizing its values in the backward phase when calculating the gradients. But we don’t really want to optimize the norm values.
The type of norm is torch Variable. PyTorch will only calculate the the gradient of loss w.r.t to the leaf node. Since norm is not a leaf node, I do think it will be updated when we do optimizer.step(). Only emb.weight will be updated since it is of type torch.nn.Parameter and it is the learnable parameter of the module.
@jdhao I was seriously looking over the web for this kind of answer, Sorry of taking this discussion again up. Could you please explain more why norm is not a leaf node in the gradient computation ?
So this doesn’t precisely replicate the max-norm functionality because it’s not checking if the norm of the vector is less than max-norm, right (in which case, this function would increase the norm of those vectors)?