Without label smoothing, loss is very small such as 0.3418:
loss = F.cross_entropy(preds, labels, label_smoothing=0)
With label smoothing, loss becomes very large(2604166656.0000), but the training process is normal:
loss = F.cross_entropy(preds, labels, label_smoothing=0.1)
So, is there anything wrong?