model.task_loss_weights.grad = torch.autograd.grad(grad_norm_loss, model.task_loss_weights) File "/home/ubuntu/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 192, in grad inputs, allow_unused) RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I can see that grad_norm_loss doesn’t have a gradient, so I set requires_grad=True explicitly, at which point I got:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_ unused=True if this is the desired behavior.
When I set allow_unused=True, I got None back as my gradient.
For context, the paper specifies:
which is why requires_grad for the constant_term of the grad_norm_loss is set explicitly to False. For reference, here is the relevant section of code:
How can I work around this?