# Cuda runtime error (59) : device-side assert triggered while using torch.topk()

Dear,
I meet the `cuda runtime error (59) : device-side assert triggered ` error when using `torch.topk()` , can you help me?

Problem:

The error related code is:

``````_feature = _feature.view(_b, _c, -1)  # [N, C, K]
assert _feature is not None, "Error-1"
_feature_sum = torch.sum(_feature.pow(2), dim=1)  # [N, K]
assert _feature is not None, "Error-2"
_idx = torch.topk(_feature_sum, top, dim=-1, sorted=False)[1]  # [N, top]
if _idx.max() > 10000:
print("_idx error")
print(_idx.max())
_jdx = torch.arange(_b).unsqueeze(1).repeat(1, top)
_feature = _feature[_jdx, :, _idx]  # [N, top, C]
``````

This code, given a tensor `_feature` with size `[NxCxK]`, returns the `top` samples with each sample having the size of `[NxC]`, forming the `[NxTOPxC]`.
The major code `_idx = torch.topk(_feature_sum, top, dim=-1, sorted=False)[1] # [N, top]`, where `top=50, _feature_sum.shape=(N, 64*64)`, while sometime, the calculated `_idx.max()` will be wrong, like:

``````
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [197,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [197,0,0], thread: [33,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
.......
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda [](int)->auto::operator()(int)->auto: block: [37,0,0], thread: [95,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered
_idx error
tensor(9223372034707292159, device='cuda:0')
Traceback (most recent call last):
File "tools/train.py", line 153, in <module>
main()
File "tools/train.py", line 109, in main
eval_loss=eval_loss
File "tools/functions.py", line 33, in train_epoch
out_dict = model_runner.train_one_batch(batch)
File "models/basemodel_runner.py", line 176, in train_one_batch
return self.forward(inp_dict)
File "models/basemodel_runner.py", line 131, in forward
File "models/basemodel_runner.py", line 197, in _feature_loss
feature_fg = self._sample_features(_feature, _mask, sample_fg, top=self.mask_sample_topk * 2)  # [N, top, C]
File "models/basemodel_runner.py", line 250, in _sample_features
if _idx.max() > 10000:
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generated/../THCReduceAll.cuh:317
``````

It seems that the error is caused by the `topk()` function. I have searched for the similar error, however, the similar ones are due to the class with number `<0` in classification task.

Anyone can help me?

Try running the above code in the cpu. You will get a trace of the actual error

which version of `pytorch` you used? I can run this part code alone successfully.

``````In [1]: import torch

In [2]: _feature = torch.rand(4,6,50)

In [3]: top = 10

In [4]: _b, _c = 4, 6

In [5]: _feature = _feature.view(_b, _c, -1)  # [N, C, K]
...: assert _feature is not None, "Error-1"
...: _feature_sum = torch.sum(_feature.pow(2), dim=1)  # [N, K]
...: assert _feature is not None, "Error-2"
...: _idx = torch.topk(_feature_sum, top, dim=-1, sorted=False)[1]  # [N, top]
...: if _idx.max() > 10000:
...:     print("_idx error")
...:     print(_idx.max())
...: _jdx = torch.arange(_b).unsqueeze(1).repeat(1, top)
...: _feature = _feature[_jdx, :, _idx]  # [N, top, C]

In [6]: _feature.shape
Out[6]: torch.Size([4, 10, 6])
``````

The problem above appears randomly.

The stack trace points to an invalid index operation, so make sure to keep the bounds of `idx`:

``````block: [197,0,0], thread: [32,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed
``````

Yes, the error appears when `_idx.max()` is larger than the shape of dimension.
However, the `_idx` is obtained by the `torch.topk()` method. Thus, this error actually caused by the `topk()`.