Get "RuntimeError: CUDA error: device-side assert triggered"

Snippet from my code :

max = torch.tensor([3])
if USE_CUDA: max = max.cuda()
max_embedding = self.max_embedding(max) # dim of max_embedding: 1*5

item_dict = {}
for item in item_list:
    item = torch.tensor(item)
    if USE_CUDA: item = item.cuda()
    item_embedding = self.item_embedding(item) # dim of item_embedding: 1*20

embedded = torch.cat((max_embedding, item_embedding), 1)

But I get error of RuntimeError: CUDA error: device-side assert triggered".
The output by adding CUDA_LAUNCH_BLOCKING=1:

/pytorch/aten/src/THC/THCTensorIndex.cu:308: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim =2, SrcDim = 2, IdxDim = -2: block: [0,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "mytest.py", line 33, in <module>
     if USE_CUDA: item = item.cuda()
RuntimeError: CUDA error: device-side assert triggered

How to fix it?

1 Like

Some indexing operation fails and I assume it’s related to your nn.Embedding layer.
Is your code running fine on the CPU? This might generally give you a better error message.

1 Like

Yeah, I find it. The size of vocab in max_embedding layer should be maximum index + 1 but mine is maximum index.
But why the Traceback indicate the error in other line of if USE_CUDA: item = item.cuda() over show the line of max_embedding = self.max_embedding(max)?

Since CUDA operations are asynchronous, the stack trace might indeed point to a wrong line of code.
This can usually be fixed by running the script via CUDA_LAUNCH_BLOCKING=1 python script.py args, which will synchronize all CUDA operations or on the CPU.

1 Like

@ptrblck

I am stuck with the same error for a different code repository. I saw your suggestion to add CUDA_LAUNCH_BLOCKING=1 in the script and then run the script. I tried and the error still persists. Complete information about the error and the trace is available in this link

Do you have hints to solve this problem? Please let me know if you need any more information.

It runs without any error on CPU

Thank you!

The stack trace from the linked issue points to this failing scatter call:

cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g))
==============================================================
/pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [47,0,0], thread: [47,0,0] Ass ertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.

so you would have to check the tensor shapes and index values.
Running the code on the CPU could also yield a better error message.

Code runs successfully on CPU!
Yes, I have checked the index values, they are very extreme

I tried to print the tensors and found that values of tensor inds_g is too large and too small(which casuses out of bounds error)
inds_g.min() = tensor(-4993021444723710459)
inds_g.max() = tensor(4575432887736600530)
inds_g = tensor([[[ 4255818524050935954, 62],
[ 4256250760978027750, 62],
[ 4237722774569238629, 62],
…,]]

usage of this tensor:
file “run_nerf_helpers.py”
cdf_g = torch.gather(cdf.unsqueeze(1).expand(matched_shape), 2, inds_g)

There is not dimension problem, its due to extreme values of inds_g variable that is causing out of bound error in torch.gather(). I validated this by setting inds_g to zero, then code runs on gpu as well.
But I do not why on gpu values are computed to extreme.

For clarity and potentially other users encountering this error: Based on the linked issue it seems to be caused by torchsearchsorted.searchsorted() and the PyTorch implementation solves the issue.