I got a problem with the CUDA kernel of torch.nn.CrossEntropyLoss. It fails in its forward function caused by an illegal memory access. I posted my issue in github. Hope it will get reponsed quickly.
For large input tensors currently, I divide the input tensor to multiple segments and call multiple times of F.cross_entropy to these segments. Is there any good way to apply cross entropy loss to large tensors?