The target looks like you are dealing with a multi-class classification, not a multi-label one, i.e. each sample corresponds to a single class.
If that’s the case, your target should only contain the current class index and should be a LongTensor
.
In case you already have the one-hot encoded targets, just call:
target = torch.argmax(target, 1)
on them and use nn.CrossEntropyLoss
(with logit outputs) as your criterion.