CUDA error: device-side assert triggered only for EMNIST dataset

I get the CUDA device-side asset triggered only when running the model on Pytorch’s EMNIST dataset.
It runs without any issues on MNIST, FashionMNIST, GTSRB, and Food101

I am changing the number of output neurons according to the dataset.

The error message is as below

../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "train_t.py", line 351, in <module>
    loss.backward()
  File "/home/dir/.local/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/dir/.local/lib/python3.8/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions

Version details:
Torch : 2.0.1+cu117
Python : 3.8
Cuda : 11.7

Additional details:
The models used are either pretrained on CIFAR10 or CIFAR100, So I use the following transforms when training on MNIST, FashionMNIST, or EMNIST:

torchvision.transforms.Grayscale(num_output_channels=3),
torchvision.transforms.Resize((32,32))

Without the change in number of output channels, it throws the error :

RuntimeError: output with shape [1, 32, 32] doesn't match the broadcast shape [3, 32, 32]

Any suggestion how to resolve the CUDA error for EMNIST ?

Could you rerun the script with blocking launches as suggested in the error message and post the stacktrace here?

This is with the launching block

../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [13,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "train_t.py", line 350, in <module>
    loss = criterion(outputs, y)
  File "/home/dir/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/dir/.local/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "/home/dir/.local/lib/python3.8/site-packages/torch/nn/functional.py", line 3029, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Assertion `t >= 0 && t < n_classes` failed.

Your target contains values out of range so double check the output shape and the min/max values of your target.

You are right, checked the output ranges. It is from [1-26] instead of [0-25]
Any clue what the cause could be ?

I am using Pytorch’s dataset as follows:

trainset = torchvision.datasets.EMNIST(root='Datasets/EMNIST', split='letters', train=True, download=True)
testset = torchvision.datasets.EMNIST(root='Datasets/EMNIST', split='letters', train=False, download=True)

Access the root folder of the dataset and check if e.g. an empty folder was created.

I don’t see any empty folder or anything out of the ordinary.
Traversing into EMNIST/EMNIST/raw, I see these files

[dir@c32 raw]$ ls -l
total 2277827
-rw-r--r-- 1 dir tsu  14739216 Jan 13 10:32 emnist-balanced-test-images-idx3-ubyte
-rw-r--r-- 1 dir tsu     18808 Jan 13 10:33 emnist-balanced-test-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu  88435216 Jan 13 10:33 emnist-balanced-train-images-idx3-ubyte
-rw-r--r-- 1 dir tsu    112808 Jan 13 10:33 emnist-balanced-train-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu  91197248 Jan 13 10:33 emnist-byclass-test-images-idx3-ubyte
-rw-r--r-- 1 dir tsu    116331 Jan 13 10:33 emnist-byclass-test-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu 547178704 Jan 13 10:33 emnist-byclass-train-images-idx3-ubyte
-rw-r--r-- 1 dir tsu    697940 Jan 13 10:33 emnist-byclass-train-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu  91197248 Jan 13 10:32 emnist-bymerge-test-images-idx3-ubyte
-rw-r--r-- 1 dir tsu    116331 Jan 13 10:32 emnist-bymerge-test-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu 547178704 Jan 13 10:33 emnist-bymerge-train-images-idx3-ubyte
-rw-r--r-- 1 dir tsu    697940 Jan 13 10:32 emnist-bymerge-train-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu  31360016 Jan 13 10:33 emnist-digits-test-images-idx3-ubyte
-rw-r--r-- 1 dir tsu     40008 Jan 13 10:32 emnist-digits-test-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu 188160016 Jan 13 10:33 emnist-digits-train-images-idx3-ubyte
-rw-r--r-- 1 dir tsu    240008 Jan 13 10:33 emnist-digits-train-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu  16307216 Jan 13 10:33 emnist-letters-test-images-idx3-ubyte
-rw-r--r-- 1 dir tsu     20808 Jan 13 10:32 emnist-letters-test-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu  97843216 Jan 13 10:33 emnist-letters-train-images-idx3-ubyte
-rw-r--r-- 1 dir tsu    124808 Jan 13 10:33 emnist-letters-train-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu   7840016 Jan 13 10:33 emnist-mnist-test-images-idx3-ubyte
-rw-r--r-- 1 dir tsu     10008 Jan 13 10:33 emnist-mnist-test-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu  47040016 Jan 13 10:32 emnist-mnist-train-images-idx3-ubyte
-rw-r--r-- 1 dir tsu     60008 Jan 13 10:33 emnist-mnist-train-labels-idx1-ubyte
-rw-r--r-- 1 dir tsu 561753746 Jan 13 10:32 gzip.zip

No other directories are present anywhere in the intermediate paths of EMNIST/EMNIST/raw

If I print the actual labels of the trainset and testset of “letters” in EMNIST, I get the following values:
[‘f’, ‘m’, ‘w’, ‘z’, ‘u’, ‘v’, ‘c’, ‘g’, ‘h’, ‘N/A’, ‘r’, ‘p’, ‘b’, ‘y’, ‘n’, ‘k’, ‘s’, ‘o’, ‘i’, ‘e’, ‘t’, ‘j’, ‘l’, ‘x’, ‘d’, ‘q’, ‘a’]

The ‘N/A’ class changes the class count to 27, instead of 26.
I redownloaded EMNIST, but still get the same error.

All other splits of EMNIST - byclass , bymerge , balanced , digits and mnist work perfectly fine and have the accurate class counts.