CUDA error: unknown error. CUDA kernel errors might be asynchronously reported at some other API call

Hi all,

I am a beginner of PyTorch and CV. I encounter a problem when trying to use mmaction2 to extract features from video clips. Following the tutorial from here, I tried to run a single video test and my command is

python3 tools/misc/clip_feature_extraction.py \
configs/recognition/i3d/i3d_r50_video_32x2x1_100e_kinetics400_rgb.py \
pretrained/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth \
--video-list examples/inputs/video_list_single.txt \
--video-root examples/inputs/video \
--out examples/outputs/examples_feature.pkl

However, I got the a RuntimeError: CUDA error: unknown error.

load checkpoint from local path: pretrained/i3d_r50_video_32x2x1_100e_kinetics400_rgb_20200826-e31c6f52.pth
[                                                  ] 0/1, elapsed: 0s, ETA:Traceback (most recent call last):
  File "tools/misc/clip_feature_extraction.py", line 229, in <module>
    main()
  File "tools/misc/clip_feature_extraction.py", line 217, in main
    outputs = inference_pytorch(args, cfg, distributed, data_loader)
  File "tools/misc/clip_feature_extraction.py", line 118, in inference_pytorch
    outputs = single_gpu_test(model, data_loader)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/engine/test.py", line 33, in single_gpu_test
    result = model(return_loss=False, **data)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 50, in forward
    return super().forward(*inputs, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmaction/models/recognizers/base.py", line 264, in forward
    return self.forward_test(imgs, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmaction/models/recognizers/recognizer3d.py", line 99, in forward_test
    return self._do_test(imgs).cpu().numpy()
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmaction/models/recognizers/recognizer3d.py", line 63, in _do_test
    feat = self.extract_feat(imgs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
    return old_func(*args, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmaction/models/recognizers/base.py", line 163, in extract_feat
    x = self.backbone(imgs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmaction/models/backbones/resnet3d.py", line 854, in forward
    x = res_layer(x)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmaction/models/backbones/resnet3d.py", line 318, in forward
    out = _inner_forward(x)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmaction/models/backbones/resnet3d.py", line 305, in _inner_forward
    out = self.conv1(x)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/conv_module.py", line 201, in forward
    x = self.conv(x)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/wrappers.py", line 80, in forward
    return super().forward(x)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 590, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/xxx/miniconda3/envs/mmlab/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 586, in _conv_forward
    input, weight, bias, self.stride, self.padding, self.dilation, self.groups
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

If I set CUDA_LAUNCH_BLOCKING=1, i.e., CUDA_LAUNCH_BLOCKING=1 python3 ..., nothing more is shown.

I am not sure what causes the error, but I guess might be CUDA or PyTorch setup problems, since the codes can work properly on the other machine. FYI, I list the environment of the two machine below.

Device 1 (has error) Device 2 (no error)
Platform WSL2, Ubuntu 20.04.3 WSL2, Ubuntu 20.04.3
GPU GeForce GTX 1080 Ti, Driver=510.06, CUDA=11.6 GeForce RTX 2060, Driver=510.06, CUDA=11.6
PyTorch pytorch=1.10.1, py=3.7, cuda=11.3.1 pytorch=1.10.1, py=3.7, cuda=11.3.1

My question is what causes the error and how I can fix it? Thanks very much.

Are you able to build and run any CUDA examples in the first setup at all or are all CUDA applications crashing?