Too many resources requested for launch when use gradcheck

When I test my own c extension operator using gradcheck, I find it is correct only when use double.
But when I test it with double, it show too many resources.
Here is my code:
import torch
from deform3d_double.deform_conv3d_functions import ConvOffset3dFunction
from torch.autograd import Variable
import os
from torch.autograd import gradcheck

os.environ['CUDA_VISIBLE_DEVICES'] = '3'
batchsize = 1
c_in = 1
c_out = 1
inpu = 3
kernel = 1
stri = 1
pad = 0
out = int((inpu + 2 * pad - kernel) / stri + 1)
channel_per_group = 1
g_off = c_in // channel_per_group
c_off = g_off * kernel * kernel * kernel * 3

conv_offset3d = ConvOffset3dFunction((stri, stri, stri), (pad, pad, pad), channel_per_group)

inputs = Variable(torch.rand(batchsize, c_in, inpu, inpu, inpu).double().cuda(), requires_grad=True)
offsets = Variable(torch.rand(batchsize, c_off, out, out, out).double().cuda(), requires_grad=True)
weight = Variable(torch.rand(c_out, c_in, kernel, kernel, kernel).double().cuda(), requires_grad=True)

print(gradcheck(conv_offset3d, (inputs, offsets, weight)))

The error:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCTensorMath.cu line=35 error=7 : too many resources requested for launch
Traceback (most recent call last):
File “/home/lshi/Project/Pytorch/deform_conv3d_pytorch_op/deform3d_double/test_gradient.py”, line 28, in
print(gradcheck(conv_offset3d, (inputs, offsets, weight)))
File “/home/lshi/Application/Anaconda/envs/pytorch36/lib/python3.6/site-packages/torch/autograd/gradcheck.py”, line 164, in gradcheck
analytical, reentrant, correct_grad_sizes = get_analytical_jacobian(as_tuple(inputs), o)
File “/home/lshi/Application/Anaconda/envs/pytorch36/lib/python3.6/site-packages/torch/autograd/gradcheck.py”, line 108, in get_analytical_jacobian
output.backward(grad_output, create_graph=True)
File “/home/lshi/Application/Anaconda/envs/pytorch36/lib/python3.6/site-packages/torch/autograd/variable.py”, line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File “/home/lshi/Application/Anaconda/envs/pytorch36/lib/python3.6/site-packages/torch/autograd/init.py”, line 98, in backward
variables, grad_variables, retain_graph)
File “/home/lshi/Project/Pytorch/deform_conv3d_pytorch_op/deform3d_double/deform_conv3d_functions/deform_conv3d_function.py”, line 48, in backward
grad_weight = weight.new(weight.size()).zero
()
RuntimeError: cuda runtime error (7) : too many resources requested for launch at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCTensorMath.cu:35

But when i test float version, I can get the result.

1 Like

I have also encountered a similar issue just now with gradcheck on custom cuda extension. Moreover, by only switching an activation function from tanh to relu the error is suddenly gone … Seems to work without gradcheck, even with double Tensors.

I did some searches and, following this comment

https://github.com/pytorch/pytorch/issues/7680#issuecomment-390729076

I added __launch_bounds__(1024) before each custom kernel functions (those with __global__).

This solves the issue for my case.