How can I get higher order gradient on GRUCell?

(Hzhwcmhf) #1

I am working on a network need regularize the gradient, so I must get a second derivative.

But I got an error of

RuntimeError: the derivative for _cudnn_rnn_backward is not implemented

I minimize my code to reproduce the error

cell = nn.GRUCell(10, 10).cuda()
parameters = list(cell.parameters())
x = torch.rand(1, 10).cuda()
y = torch.rand(1, 10).cuda()
incoming = cell(x, torch.zeros(1, 10).cuda())
incoming = cell(y, incoming)
loss = torch.sum(incoming)
grad_all = grad(loss, parameters, retain_graph=True, create_graph=True, only_inputs=True)
loss2 = torch.sum([v.view(-1) for v in grad_all]))

and come up with another error

RuntimeError: trying to differentiate twice a function that was markedwith @once_differentiable

So is there any workaround for me to get the second order gradient? (I’m on pytorch0.4.1)

(Thomas V) #2

I’d probably implement a gru cell from nn.Linear.

Best regards


(Hzhwcmhf) #3

Thanks a lot. It works.

So it’s the cuda version of GRUCell marked with once_differentiable. But I hope there is a more precise error, and a simpler way to use the GRUCell rather than implement it myself when I want to get the second order gradient.

(Thomas V) #4

I’m all for it and implementing wonderful things for the various RNNs is on my “things I’d like to do when I get to it list”, but I can’t just promise when that will be…

Best regards


(Ramprs) #5

@hzhwcmhf , I am stuck with the exact same problem. Would you mind sharing your GRU/LSTM cell implementation with nn.Linear()? Thanks in advance

(Hzhwcmhf) #6

I think this may help.

(Ramprs) #7

Thanks @hzhwcmhf. That was very helpful.