CUDA TorchScript error when using TensorRT

I am getting CUDA TorchScript error when using TensorRT inside PyTorch. Below are the errors:

  File "/home/ravi/yolact_edge/layers/output_utils.py", line 109, in postprocess
    masks = crop(masks, boxes)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "/home/ravi/yolact_edge/layers/box_utils.py", line 316, in fallback_function
    """
    h, w, n = masks.size()
    x1, x2 = sanitize_coordinates(boxes[:, 0], boxes[:, 2], w, padding, cast=False)
             ~~~~~~~~~~~~~~~~~~~~ <--- HERE
    y1, y2 = sanitize_coordinates(boxes[:, 1], boxes[:, 3], h, padding, cast=False)
  File "/home/ravi/yolact_edge/layers/box_utils.py", line 297, in sanitize_coordinates
        _x1 = _x1.long()
        _x2 = _x2.long()
    x1 = torch.min(_x1, _x2)
         ~~~~~~~~~ <--- HERE
    x2 = torch.max(_x1, _x2)
    x1 = torch.clamp(x1-padding, min=0)
RuntimeError: CUDA error: device-side assert triggered

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [0,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [0,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [0,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

Traceback (most recent call last):
  File "/home/ravi/yolact_edge/layers/output_utils.py", line 100, in postprocess
    masks = proto_data @ masks.t()
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

The two errors from the above are RuntimeError: CUDA error: device-side assert triggered and RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling 'cublasCreate(handle)'

Please note that these errors arrive only when all of the following conditions are met:

  1. When TensorRT is used.
  2. After processing a few frames
  3. When the parameter score_threshold is non-zero

Below are my environment details:

* python --version 3.6.9
* tensorrt.__version__ 7.2.2.3
* torch.__version__ 1.7.1
* torchvision.__version__ 0.8.2
* torch.version.cuda: 10.2

My question is how to debug the above errors which are caused by TensorRT?

PS: The source code is available here

The error seems to be:

/pytorch/aten/src/ATen/native/cuda/Indexing.cu:605: indexSelectSmallIndex: block: [0,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

most likely caused in:

x1, x2 = sanitize_coordinates(boxes[:, 0], boxes[:, 2], w, padding, cast=False)

so I assume the indexing in boxes fails.
You could rerun the script with CUDA_LAUNCH_BLOCKING=1 python script.py args and verify the line of code raising this error.

@ptrblck Thank you very much for the suggestion.

I managed to make it work and noticed something strange during this process as mentioned below:

TensorRT tensor is showing such error when slicing in NumPy fashion. For example my_tensor[idx] is showing error but torch.index_select(my_tensor, 0, idx) is working fine.