CUDA error: an illegal memory access while training a deep learining model

I am trying to train a deep learning model on a custom dataset for semantic segmentation.

When I am trying to train on my PC, I am getting the following error.

 File "tools/train.py", line 223, in <module>
    main()
  File "tools/train.py", line 185, in main
    train(config, epoch, config.TRAIN.END_EPOCH, 
  File "/home/deshpand/thesis_rr/semantic_segmentation_network/PIDNet/tools/../utils/function.py", line 43, in train
    losses, _, acc, loss_list = model(images, labels, bd_gts)
  File "/home/deshpand/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/deshpand/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/deshpand/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/deshpand/thesis_rr/semantic_segmentation_network/PIDNet/tools/../utils/utils.py", line 48, in forward
    loss_s = self.sem_loss(outputs[:-1], labels)
  File "/home/deshpand/anaconda3/envs/torch_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/deshpand/thesis_rr/semantic_segmentation_network/PIDNet/tools/../utils/criterion.py", line 90, in forward
    return sum([
  File "/home/deshpand/thesis_rr/semantic_segmentation_network/PIDNet/tools/../utils/criterion.py", line 91, in <listcomp>
    w * func(x, target)
  File "/home/deshpand/thesis_rr/semantic_segmentation_network/PIDNet/tools/../utils/criterion.py", line 72, in _ohem_forward
    pred, ind = pred.contiguous().view(-1,)[mask].contiguous().sort()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

This is the first time I have seen error like this. Can someone please explain what is going on here?

The specifications for my GPU are as follows.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
| 25%   36C    P0    29W / 120W |    648MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       902      G   /usr/lib/xorg/Xorg                245MiB |
|    0   N/A  N/A      1234      G   /usr/bin/kwin_x11                 123MiB |
|    0   N/A  N/A      1289      G   /usr/bin/plasmashell               48MiB |
|    0   N/A  N/A      1481      G   /usr/lib/firefox/firefox          173MiB |
|    0   N/A  N/A      5801      G   ...RendererForSitePerProcess       49MiB |
+-----------------------------------------------------------------------------+

I will really appreciate the help here. Also, I am using PyTorch version 1.13.1

>>> print(torch.__version__)
1.13.1
>>> 

The graphics card model is GeForce GTX 1080Ti (6GB model)

Could you update to the latest stable or nightly release and check if you are still seeing the same error, please?

I tried with nightly build, getting the same error.

Torch build that I used.

>>> import torch
>>> print(torch.__version__)
2.1.0.dev20230405+cu117
>>> 

Error image

Thank you for checking! Could you post a minimal and executable code snippet to reproduce the issue, please?

I am posting the image and the code snippet for the line where this error is encountered. In the image below, pred.contiguous() is the line where the error occurs. I have tried to replace this command with torch.as_strided() but, that did not work out. I am trying to either rewrite this code for calculating the loss completely or use a different function in pytorch to define the same loss.

class OhemCrossEntropy(nn.Module):
    def __init__(self, ignore_label=-1, thres=0.7,
                 min_kept=100000, weight=None):
        super(OhemCrossEntropy, self).__init__()
        self.thresh = thres
        self.min_kept = max(1, min_kept)
        self.ignore_label = ignore_label
        self.criterion = nn.CrossEntropyLoss(
            weight=weight,
            ignore_index=ignore_label,
            reduction='none'
        )

    def _ce_forward(self, score, target):


        loss = self.criterion(score, target)

        return loss

    def _ohem_forward(self, score, target, **kwargs):

        pred = F.softmax(score, dim=1)
        print(type(pred))
        print('size of pred: ', pred.numel())
        pixel_losses = self.criterion(score, target).contiguous().view(-1)
        mask = target.contiguous().view(-1) != self.ignore_label

        tmp_target = target.clone()
        tmp_target[tmp_target == self.ignore_label] = 0
        pred = pred.gather(1, tmp_target.unsqueeze(1))
        #pred, ind = torch.as_strided(pred, (1, pred.numel()), (0, pred.numel())).view(-1)[mask]  # Changing contiguous with as_strided does not work because the memory is not sufficient to store this. 
        print('program executing until this point.')
        pred, ind = pred.contiguous().view(-1,)[mask].contiguous().sort()       
        min_value = pred[min(self.min_kept, pred.numel() - 1)]
        threshold = max(min_value, self.thresh)

        pixel_losses = pixel_losses[mask][ind]
        pixel_losses = pixel_losses[pred < threshold]
        return pixel_losses.mean()

The contiguous() operation is most likely not failing but is re-raising a sticky CUDA error which also corrupts the CUDA context.
To further debug this issue we would need to get an executable code snippet to reproduce the issue.