Error in Cross-Entropy Loss Computation with CUDA: Assertion Failure on Batch Size Reduction to 16

Hello, I encountered an error while attempting to compute the cross-entropy loss function. The code snippet below presents the relevant portion:

logits_x = logits[:inputs_x.shape[0]]
loss_s = F.cross_entropy(logits_x, targets_x)


  • targets_x has a size of [16].
  • logits_x has a size of [16, 47].

The specific error message is as follows:

nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stack trace below might be incorrect. For debugging, consider passing CUDA_LAUNCH_BLOCKING=1.

It’s worth noting that the code runs without any errors on other devices. However, when I reduce the batch size to 16, it results in the aforementioned error. Any insights or suggestions on resolving this issue would be appreciated

Your target contains class indices which are out of bounds:

Assertion `t >= 0 && t < n_classes` failed.

Thank you. I have reviewed the dataset folder and identified an additional folder that is unrelated to my dataset.