Runtime Error in backward with no clues

Hi,
I’m wondering if we can get the trace of backward error. I got the following error and I’m clueless where the bug is.

File “/mnt/ilcompf8d0/user/rluo/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 158, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “/mnt/ilcompf8d0/user/rluo/anaconda2/lib/python2.7/site-packages/torch/autograd/init.py”, line 99, in backward
variables, grad_variables, retain_graph)
File “/mnt/ilcompf8d0/user/rluo/anaconda2/lib/python2.7/site-packages/torch/autograd/function.py”, line 91, in apply
return self._forward_cls.backward(self, args)
File “/mnt/ilcompf8d0/user/rluo/anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/blas.py”, line 132, in backward
grad_batch1 = torch.bmm(grad_output, batch2.transpose(1, 2))
File “/mnt/ilcompf8d0/user/rluo/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 592, in bmm
return self._static_blas(Baddbmm, (output, 0, 1, self, batch), False)
File “/mnt/ilcompf8d0/user/rluo/anaconda2/lib/python2.7/site-packages/torch/autograd/variable.py”, line 580, in _static_blas
return cls.apply(
(args[:1] + args[-2:] + (alpha, beta, inplace)))
File “/mnt/ilcompf8d0/user/rluo/anaconda2/lib/python2.7/site-packages/torch/autograd/_functions/blas.py”, line 119, in forward
batch1, batch2, out=output)
RuntimeError: cublas runtime error : the GPU program failed to execute at /mnt/ilcompf8d0/user/rluo/pytorch/torch/lib/THC/THCBlas.cu:378

It looks like the RuntimeError is in the C backend. You can run that through gdb with something like the following:

gdb python
> catch throw
> run test.py

Where test.py is your python script that’s raising the RuntimeError
gdb will pause when the RuntimeError is thrown and then you can type backtrace to get a backtrace.

1 Like