Is there any particular reason for which CrossEntropyLoss supports only one-hot targets?
Granted, it is not too hard to write a version of CrossEntropyLoss that supports a continuos distribution as target (using KLDivLoss and adding entropy, for example) but it seems like it’s something missing from the list of loss functions. Additionally, one could then implement label-smoothing regularization, distillation, etc. very easily.
It seems like a common enough function that should be added to the library