[Solved]CUDA threadidx error from ATen (IndexKernel.cu:53)

I think I just got the same assertion error from the IndexKernel.cu:53 / ATen

Which I cannot find any solution until now…

::operator()(int)->auto: block: [0,0,0], thread: [4,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [5,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [6,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [7,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1549628766161/work/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [8,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

Is there any mean to deal with this problem properly? FYI: I’m not really used to dealing with CUDA device side assert errors.

Im using conda/pytorch v1.0.1 on machine equipped with Titan Xp / CUDAv9.1.85 .

The line I tried to execute is as follows:

#A: declared here
#B: 64, 256 torch.cudaFloatTensor
#C: 64 10 256 torch.cudaByteTensor
A = B.unsqueeze(1).repeat(1,10,1) * C.float()```

The error occurs the line before `A = B.unsqueeze(1).repeat(1,10,1) * C.float()``
It’s just raised here. I guess your indexs may go out of boundaries.

1 Like

@chenyuntc Thx… but I didn’t get it still. .unsqueeze(1) of the (64, 256) shaped B would result (64,1,256) tensor and then .repeat(1,10,1) will yield (64,10,256) which is the same shape with C (and thus becoming compatible with * operation btw B)

I thought this line executes for several times w/o any complaints but throws device-side error at some point.

If the operations with A, B, and C was the problem, Could you tell me more about it? Pls let me know…

No I mean there should be one line code before A = B.unsqueeze(1)... that use advanced indexing operation, like C=A[:,B,1], It’s this line of code goes wrong, but because of the cuda’s asynchronous mode, you get error a while after that.

1 Like

I think I didn’t get what you meant by this (it’s not working piece of code thus I needed to guess). What is the one line of code required to be preceded and what does it supposed to do with this situation?

At first, I thought you were saying that I should allocate new memory for C (that is with shape: 64 10 256) before putting computed output but I think it wasn’t a point as the error remains intact. Is it because I overly used bunch of .repeat() operations in my model?

Or should I put torch.cuda.synchronize()? after each suspected operations?

Trying not to be too annoying but I think I met a giant wall in front of me please help );

@chenyuntc Thx! I finally found the problematic part of my code that was inappropriate use of .byte()

the line I mentioned:

before that line there was

C= C.byte() #for this needed to be used as a mask

after removing the .byte() it works great!

after several steps of running, the problem reoccurs
true story

and print(tensor.sum().item())

You’ll find the line goes wrong.

1 Like

@chenyuntc Ah! now I found out what was the issue. It was not about the shape but about the indices I was passing to the C. (actually C is sth like D[indicestensor]. Some malicious values were passed as an index caused ‘index going out of boundaries’ error…! thx!