nn.Embedding doesn't seem to support double backward

I am not quite able to twice differentiate a module with nn.Embedding in it

I am using version 0.3.1

Here is the code to reproduce the bug

class Test(torch.nn.Module):
    
    def __init__(self):
        super().__init__()
        self.embd = torch.nn.Embedding(1000, 100)
        self.dense = torch.nn.Linear(100, 1)
    
    def forward(self, inp):
        inp = self.embd(inp)
        return self.dense(inp)

test = Test()
test.cuda()
inp = Variable(torch.ones(10).long().cuda())
out = test(inp)
raw_loss = out.mean(dim=0)

loss_grad = torch.autograd.grad(outputs=raw_loss,
                         inputs=list(test.parameters()),
                         retain_graph=True, create_graph=True, only_inputs=True)
norm = sum([param.norm()**2 for param in loss_grad])
loss = raw_loss + norm

loss.backward(retain_graph=True)

It fails with the following error trace

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-9e67f1421e58> in <module>()
     22 loss = raw_loss + norm
     23 
---> 24 loss.backward(retain_graph=True)

~/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py in backward(self, gradient, retain_graph, create_graph, retain_variables)
    165                 Variable.
    166         """
--> 167         torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
    168 
    169     def register_hook(self, hook):

~/miniconda3/lib/python3.6/site-packages/torch/autograd/__init__.py in backward(variables, grad_variables, retain_graph, create_graph, retain_variables)
     97 
     98     Variable._execution_engine.run_backward(
---> 99         variables, grad_variables, retain_graph)
    100 
    101 

RuntimeError: trying to differentiate twice a function that was markedwith @once_differentiable

tagging @smth

Its not working even with master.

Until this is fixed I was thinking of using a custom Embedding layer.

The following has very limited functionality but enough for my use case. I don’t know if it’s fully correct though:-

class myEmbedding(nn.Module):
    def __init__(self, inp, out):
        super().__init__()

        self.inp = inp
        self.out = out

        self.embeddings = nn.Parameter(torch.randn(inp, out))
        self.dim_bias = nn.Parameter(torch.randn(out))
        self.bias = nn.Parameter(torch.randn(1))

        self.params = nn.ParameterList([self.embeddings, self.dim_bias, self.bias])

    def forward(self, ind):
        # inp = (*)
        ind_shape = ind.size()
        ind = ind.view(-1)  # (<*>)
        emb = self.embeddings[ind] + self.dim_bias[None, :] + self.bias[None, :]  # (<*>, out dim)
        emb = emb.view(*ind_shape, self.out)  # (*, out dim)
        return emb

I am on master and my error is a bit different though. Also unfortunately, your solution doesn’t work for my case. :confused:

I get this error:
RuntimeError: the derivative for embedding_dense_backward is not implemented

opened an issue to track this: https://github.com/pytorch/pytorch/issues/6469

1 Like