I am working on a network need regularize the gradient, so I must get a second derivative.

But I got an error of

RuntimeError: the derivative for _cudnn_rnn_backward is not implemented

I minimize my code to reproduce the error

```
cell = nn.GRUCell(10, 10).cuda()
parameters = list(cell.parameters())
x = torch.rand(1, 10).cuda()
y = torch.rand(1, 10).cuda()
incoming = cell(x, torch.zeros(1, 10).cuda())
incoming = cell(y, incoming)
loss = torch.sum(incoming)
grad_all = grad(loss, parameters, retain_graph=True, create_graph=True, only_inputs=True)
print(grad_all[0].requires_grad)
loss2 = torch.sum(torch.cat([v.view(-1) for v in grad_all]))
loss2.backward()
```

and come up with another error

RuntimeError: trying to differentiate twice a function that was markedwith @once_differentiable

So is there any workaround for me to get the second order gradient? (I’m on pytorch0.4.1)