"index out of bounds" when moving to device


I get the following cuda error when I try to move a tensor from cpu to device:

/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
CUDA error: device-side assert triggered

The code I am using is as follows:

cur_inds = has_answer_vector[sample_ind]
tensorized_inds = torch.from_numpy(np.array(cur_inds))
cur_positive_inds = tensorized_inds.to(logits.device).long()

The error occurs on the final line, but it only fails on some batches and not others.

has_answer_vector is simply a list of lists with an integer in them
logits is a FloatTensor

An example of the values I get when it fails are:

has_answer_vector: [[1], [1], [1], [1], [0], [0], [0], [0], [16], [0], [1], [2], [146], [0], [9], [3]]
sample ind: 13
cur_inds: [0]
tensorized_inds: tensor([0])
tensorized_inds.device : cpu
logits.device : cuda:0

Any ideas why this happens?

CUDA operations are executed asynchronously, so a failed operation might raise an assertion, which will be triggered by the next CUDA operation.
In your case an indexing failed and you see the error in the next .to() operation.
You can rerun the code via:

CUDA_LAUNCH_BLOCKING=1 python script.py args

to get the proper line of code which is failing (or run the code on the CPU alternatively).