Slicing cuda tensor

Hi everyone, I have a doubt about slicing on cuda tensor (Pytorch 1.10 - GPU RTX 3070). For example, following code cant learning

# labels for multi-outputs  
for epoch in range(epochs):
   for features, labels in train_loader:
      features = features.to(device)
      labels = labels.to(device)
      preds = model(x)
      loss = cross_entropy(preds, labels[:,0])
      optim.zero_grad()
      loss.backward()
      optim.step()

But, if I use :

 loss = cross_entropy(preds, labels[:,0].contiguous())

it works!, the model starts learning. What is the difference or its a bug?. (Sorry for my bad english).

Thanks in advance!

Could you post an executable code snippet, which would show the failure to learn in the first approach and the successful training in the second?
Have you measured the mean +/- stddev accuracy for both approaches? If so, could you post them here as well, please?

I share the following link to Colab environment (pytorch1.10 Nvidia k80): https://colab.research.google.com/drive/15VNZZP1uPdrY4hH0cYbBYdVT_nxWbJVa?usp=sharing

you can see the logs with the first and second approach. In the first approach the accuracy is stagnant, however in the second approach, it got improved with the epochs. (in the same way I attach a script to run https://drive.google.com/file/d/1Fb4qc_ksMp-8jeWrb0H8SIDDdiFpwBQ5/view?usp=sharing).

Thanks!

Thanks for the notebook! I wasn’t able to run it on Colab for unknown reasons (the runtime just stopped), but was able to reproduce the issue locally in PyTorch 1.10.0. It seems to be fixed in the current master/nightly and works as expected with 1.11.0.dev20211101+cu113.
I couldn’t quickly find a related PR for the issue, but will check a bit more what might have been the issue in 1.10.