Using autograd when requires_grad=False

I am trying to implement the GradNorm from wherein I am using a pretrained word2vec embedding layer for which I have set requires_grad=False.
On cell 4 of the Notebook there is
G1R = torch.autograd.grad(l1, param[0], retain_graph=True, create_graph=True).
The above line throws the following error
G1R = torch.autograd.grad(l1, param[0], retain_graph=True, create_graph=True) File "/usr/lib64/python3.6/site-packages/torch/autograd/", line 145, in grad inputs, allow_unused) RuntimeError: One of the differentiated Tensors does not require grad.
Is there anyway to implement this code for my usecase where pretrained embedding can have requires_grad=False ?

What is param[0] supposed to be here? The weights of your embedding layer?

# Getting gradients of the first layers of each tower and calculate their l2-norm 
        param = list(MTL.parameters())

where MTL is the class class MTLnet(nn.Module):
I myself am not sure why they’re using param[0]

I guess you want to check what param[0] is and why they use that.
Then make sure that this requires gradients.
Note that if gradients are needed to compute your gradient penalty loss but you don’t want to update your embedding, you can keep the embedding as requiring gradient but don’t give it to the optimizer so that it will never be updated.