CUDA Illegal Memory Access

While trying to implement a backward pass, I keep getting the error of CUDA Illegal Memory Access.

@staticmethod
    def backward(ctx, grad_output):
         grad_label = grad_output.clone()
        num_ft = grad_output.shape[0]
        # grad_label.data.resize_(num_ft, 32, 41)
        lin_indices_3d, lin_indices_2d = ctx.saved_variables
        num_ind = lin_indices_3d.data[0]
        grad_label.data.view(num_ft, -1).index_copy_(1, lin_indices_2d.data[1:1 + num_ind],
                                                     torch.index_select(grad_output.data.contiguous().view(num_ft, -1),
                                                                        1, lin_indices_3d.data[1:1 + num_ind]))
        # raw_input('sdflkj')
        return grad_label, None, None, None

I tried using pdb to see what might be the possible cause

I am not sure what is wrong in the implementation here. Any help would be highly appreciated.

I am using PyTorch 1.3 but the same error persists on 1.4 and 1.5

Hi,

Illegal memory access error occur when your program is trying to access an memory location for which the program does not have permission to access.

By setting CUDA_LAUNCH_BLOCKING=1 , you can see where the error comes from.

When I run with the CUDA_LAUNCH_BLOCKING=1 I get the error

Any idea what could be the reason?

Could you install the nightly binary (in a new virtual environment) and rerun the code?
If you are still running into this error, could you post a code snippet to reproduce this issue, please?

@ptrblck I will start on that. I have another question though. Digging around it seems that the issue was not present in PyTorch 1.2 . I thought of downgrading to PyTorch 1.2 but as soon as I do that, I would get an error of

For one of the PyBind modules. This error was not there when I worked with PyTorch 1.3 and above. Any ideas about this?

You might try to use input_.data<scalar_t>(), but I would recommend to stick to the latest version instead of downgrading.

@ptrblck Same error. What is the general reason for this error? I want to double check that it is not something related to input data before I open a new issue

@ptrblck I read in one of your previous posts that the error might come when the input tensor is not contiguous and I make sure that the input is contiguous by calling .contiguous() on the input tensor. At this point, I have no idea what is possibly wrong here.

@ptrblck it works with the nightly version. Thank you so much for your support

I think the syntax just changed after 1.3, as this seems to be a compilation error.
Good to know, it’s working with the nightly. :slight_smile: