Hello, I encountered an error while attempting to compute the cross-entropy loss function. The code snippet below presents the relevant portion:
logits_x = logits[:inputs_x.shape[0]]
loss_s = F.cross_entropy(logits_x, targets_x)
Where:
targets_x
has a size of [16].logits_x
has a size of [16, 47].
The specific error message is as follows:
nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stack trace below might be incorrect. For debugging, consider passing CUDA_LAUNCH_BLOCKING=1.
It’s worth noting that the code runs without any errors on other devices. However, when I reduce the batch size to 16, it results in the aforementioned error. Any insights or suggestions on resolving this issue would be appreciated