I’ve been struggling with properly creating a loss function for a combination of multiclass and multilabel classification.
The fact that NLLLoss/CrossEntropyLoss only accepts categoricals and there is no equivalent for OneHot vector is handicapping.
Use case - For example with 10 classes:
- classes 0 to 4 are exclusive (group A)
- classes 5 and 6 are exclusive (group B)
Group A, group B and classes 7 to 9 are independant.
I would like to use, cross-entropy for group A, cross entropy for group B, binary cross-entropy for classes 7 to 9.
The target with the true labels is a one-hot-vector.
I managed to split it and format it for crossentropy
and binary_cross_entropy + sigmoid
but the result is quite ugly.
1/ Details on how to one-hot-encode and how to revert from one-hot-encoding.
I tried using gather
but couldn’t manage to do it, so I used masked_select
, is there a way with gather
?
import torch
from torch import nn
from torch.autograd import Variable
target = Variable(torch.LongTensor([1, 0, 4]))
print(target)
target_onehot = Variable(torch.zeros(3, 5))
target_onehot.scatter_(1, target.view(-1,1), 1) #_ for inplace
print(target_onehot)
val = torch.arange(0,5)
print(val)
val = val.expand(3,5) #expand is torch broadcasting
print(val)
new_target=val.masked_select(target_onehot.data.byte())
print(new_target)
Output:
Variable containing:
1
0
4
[torch.LongTensor of size 3]
Variable containing:
0 1 0 0 0
1 0 0 0 0
0 0 0 0 1
[torch.FloatTensor of size 3x5]
0
1
2
3
4
[torch.FloatTensor of size 5]
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
[torch.FloatTensor of size 3x5]
1
0
4
[torch.FloatTensor of size 3]
2/ Using that in a custom loss function:
My forward
function must be defined like that for GroupA.
def forward(self, input, target):
# Cross-Entropy wants categorical not one-hot
# Reverse one hot
groupA_target = Variable(torch.arange(0,4).expand(target.size(0),4).masked_select(target[:,:4].data.byte().cpu()).long().cuda(), requires_grad = False)
loss_groupA = F.cross_entropy(input[:,:4],
groupA_target,
self.groupA_weight,
self.size_average)
[...]
Explanation:
-
data.byte().cpu()
becausemasked_select
wants a byte tensor on CPU -
.long().cuda()
becauseF.cross_entropy
expects a Cuda LongTensor
Ideally I would like a F.cross_entropy_OneHot function that I could use like that:
def forward(self, input, target):
loss_groupA = F.cross_entropy(input[:,:4],
target[:,:4],
self.groupA_weight,
self.size_average)
[...]
This is already possible with binary_cross_entropy + sigmoid activation.