RuntimeError: could not compute gradients for some functions (CudnnRNN)

Hi,

I’ve tried taking the code from https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb
and making it work on the GPU, converted the models + variable initializations to .cuda().

Unfortunately getting the following behavior:

  File "/home/jd/pytorch/examples/translation/translate_gpu.py", line 296, in trainEpochs
    loss = train(input_variable, target_variable, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion)

  File "/home/jd/pytorch/examples/translation/translate_gpu.py", line 247, in train
    loss.backward()

  File "/home/jd/anaconda/lib/python2.7/site-packages/torch/autograd/variable.py", line 158, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)

RuntimeError: could not compute gradients for some functions (CudnnRNN)

It works on the CPU. Any hints as to what might be causing this behavior? The RuntimeError sadly doesn’t mention which operations couldn’t get the gradient computed. Is there a list of supported/unsupported ops I could reference somewhere?

Code + data can be found here: https://github.com/ponythewhite/pytorch_fun

Thanks,


Jacek

1 Like

I don’t think it’s your fault. We’ll have to look into that. Thanks for the report.

Thanks!

Should I create an issue on Github, or will you? Would be great if I could track the status.

Best,


Jacek

I think it’s similar to this issue. I’ll need to take a look first, and if it’s not resolved I’ll open an issue myself. Thanks!

Awesome, thanks! Would love to help with this, but can’t do much in C++.

I’ve submitted a PR with a fix for that just a few seconds ago :slight_smile: It will be included in a next release, that we’ll upload tomorrow.

1 Like

Thank you! I also ran into this but forgot to report it.

Got 0.1.9 hot off the press and I can confirm it works, approx 3.5x speedup on my GPU. Any ideas for making it faster? Is it a bad idea to make new Variables in the training loop?

Oh no, you’ve found the new version before it has been taken down! :grin: There’s been one new bug introduced by my fix for this issue, and we’ll reupload the packages today, so please update it again.

What do you want to make faster? I can’t tell without seeing the code :slight_smile:

And no, creating Variables is very very cheap, so you can do it at every step without any problem.

Still working well with the newer 0.1.9. I was referring to the same code as @ponythewhite if you happened to look at that already.

Yup, that’s been fixed! The fix initially introduced another bug, but that’s fixed as well.

1 Like

Hi,
I also run into this bug when I am trying to realize Layer-wise Relevance Propagation.
Concretly, I rewrite backward function for each layer in Net, for example

class Linear(torch.autograd.Function):

    def forward(self, input, weight, bias):
        self.eps = 1e-16
        self.Z = weight.t()[None, :, :] * input[:, :, None]
        self.Zs = self.Z.sum(dim=1, keepdim=True)
        if bias is not None:
            self.Zs += bias[None, None, :]
        return self.Zs.squeeze(dim=1)

    def backward(self,R):
        print('linear')
        return (R[:, None, :] * self.Z / (self.Zs + (2 * (self.Zs >= 0) - 1) * self.eps)).sum(dim=2)

and run

model.readout.backward(out)

Unfortunately, it reported as follows

Traceback (most recent call last):

  File "G:\Explainer\_Geo_Exp_inter\infectionLRP.py", line 117, in <module>
    print(model.readout.backward(out))

  File "D:\Anaconda3\lib\site-packages\torch\tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)

  File "D:\Anaconda3\lib\site-packages\torch\autograd\__init__.py", line 98, in backward
    allow_unreachable=True)  # allow_unreachable flag

RuntimeError: could not compute gradients for some functions

It has confused me for a few days and I don’t really kown what cause it and how to fix it.

Yingxin Wu

Hi,

Please see the doc here on how to write a proper autograd.Function.
Old style custom Function are not supported anymore.