Got "Element 0 of tensors" when using torch.where() in custom loss

Try analyzing the magnitude of your gradients. Maybe your loss is suffering from some numerical problems. Then, try changing the formula of your loss without changing its objective. For instance, it is well known that it’s generally better to use nn.LogSofmax instead of nn.Softmax. This thread discusses this observed property.