I have a target with size of
NxHxW, where N is batch size. I want to make one hot encoding with the output size of
NxCxHxW for image segmentation.
targets_extend.unsqueeze_(1) # convert to Nx1xHxW
one_hot = torch.cuda.FloatTensor(targets_extend.size(0), C, targets_extend.size(2), targets_extend.size(3)).zero_()
one_hot.scatter_(1, targets_extend, 1)
In the above code, I have used
targets_extend=targets.clone(), because the targets will be used in the cross-entropy loss (only allows the size of
NxHxW, instead of
Nx1xHxW). More clears, the output of one hot encoding is used for BCE loss, while the
targets used in the cross-entropy loss. Is my way (
targets_extend=targets.clone()) correct in the making one hot encoding? Can I use
I would stick with
.detach still uses the same underlying data.
unsqueeze_ will therefore also unsqueeze
x = torch.ones(10)
y = x.detach()
> torch.Size([1, 10])
> torch.Size([1, 10])
May I ask, why you need the
It doesn’t seem like you have a multi-label setup or are you manipulating the
one_hot target somehow afterwards?
Thanks. I used it for computing the adversarial loss of multiple class ( likes 21 classes in voc). Meanwhile, the cross entropy is for segmentation loss. As I mentioned, the cross entropy expects the targets size of NxHxW, while BCELoss is NxCxHxW. So, your answer is using .clone, Am I right?
Yes, I would use
clone and provide two different targets for the different criteria.
This solution is good when we have the target segmentations in the range of the index we are mapping to.
i.e if C classes, the intensity of the segmentation mask ranges from 1 to C.
However, how do we one-hot encode a segmentation mask with 5 classes but intensities in the range 0,255 with only 5 unique values in the mask?
Please let me know if the question is not clear.
true_1_hot = torch.eye(num_classes)[targs.squeeze(1)]
how does this work
shape of targs after squeeze is BS,H,W
resultant shape is
I fail to understand this
torch.eye(num_classes) will create a tensor in the shape
[num_classes, num_classes] with 1s in its diagonal.
[targs.squeeze(1)] will then index this tensor “row-wise” and return the one-hot encoded tensor.
You would get the same output using
To dig into the indexing using multiple dimensions, check this example code snippet:
a = torch.arange(10)
b = torch.randint(0, 10, (16,))
# index with "flat" dimension
out_b = a[b]
# index with multiple dimensions
c = b.view(2, 2, 2, 2)
out_c = a[c]
print((out_b == out_c.view(-1)).all())