Mysterious behaviour of cross entropy loss

Pavan_Teja · December 27, 2020, 10:21pm

Hello,
I have encountered a very weird behaviour of nn.functional.cross_entropy(pred, label, reduction=‘none’) function. Following is the behaviour I observed. It would be great if someone could explain the reason for this:

Expected behaviour - If for any i label[i] > C where pred is of shape N x C x H x W , cross_entropy should raises an IndexError.

Showcased behaviour -

If pred and target are cpu tensors. Output : IndexError: Target 20 is out of bounds.
If pred and target are gpu(cuda) tensors then no error is raised.

Why does cross_entropy shows a discrimination in its behaviour for gpu and cpu tensors? Also the above observed behaviour for cuda tensors is unexpected and isn’t this a bug ?

My environment:
Python : 3.7.7
Pytorch : 1.6.0
Cuda : 10.1

PS: The above behaviour almost consumed 3 hrs of my time. Hope someone would help me understand the reason for this so that I will be cautious next time I deal with gpu tensors.

Below is the code snippet to replicate:

import torch
import torch.nn.functional as F
input = torch.randn(1, 19, 32, 64, requires_grad=True)
input_c = input.cuda()
target = torch.randint(22, size=(1,32,64,))
target_c = target.cuda()
F.cross_entropy(input_c, target_c, weight=None, reduction='none', ignore_index=255)

This code should not raise any error. Where as removing the cuda() would raise an error.

InnovArul · December 28, 2020, 8:22am

I could reproduce the issue. I think you have stumbled upon a pytorch bug in nll_loss2d.
I had played around with your example and I could notice that reduction='none' could be the reason. Error is triggered for other reduction (mean / sum) option.

Further, the reason could be that, the device assertion is not checked in case of reduction='none'.

github.com

pytorch/pytorch/blob/master/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu#L70-L99


if (reduction == at::Reduction::None) {
  int64_t batch_size = THCTensor_(size)(state, input, 0);
  int64_t H = THCTensor_(size)(state, input, 2);
  int64_t W = THCTensor_(size)(state, input, 3);
  int64_t count = batch_size * H * W;
  THCTensor_(resize3d)(state, output, batch_size, H, W);
  if (count == 0) {
    // This guards from unnecessary operations and launching CUDA kernel with 0 blocks.
    return;
  }
  if (weights) {
    weights = THCTensor_(newContiguous)(state, weights);
  }
  SpatialClassNLLCriterion_updateOutput_no_reduce_kernel<scalar_t>
    <<<GET_BLOCKS(count), CUDA_NUM_THREADS, 0, c10::cuda::getCurrentCUDAStream()>>>(
      count,
      toDeviceTensor<scalar_t, 4>(state, input),

This file has been truncated. show original

THCudaCheck(cudaGetLastError()); is missing incase of reduction='none'.

I have raised a bug on behalf of you: nll_loss2d: t >= 0 && t < n_classes assertion is not checked when using GPU tensors and reduction='none' · Issue #49882 · pytorch/pytorch · GitHub

Pavan_Teja · December 28, 2020, 5:52pm

Thanks for raising the bug.