PyTocrh way for one-hot-encoding multiclass target variable


Sorry for maybe super basic question but could not find it.

What is a correct Pytorch way to encode multi-class target variable?

I have > 30 target classes for target variable - like AA, AB, BB, BA, BC ....

Should I use ScikitLearn tools and then convert numpy arrays into torch tensors?

Or there is built-in functionality?


You could use this code snippet to transform your class indices into a one-hot encoded target:

target = torch.randint(0, 10, (10,))
one_hot = torch.nn.functional.one_hot(target)

How do I map rows in one hot torch tensor to original labels and back?
Is it built-in functionality or should I care about this myself by creating custom dictionaries?
Trying not to invent a wheel )

To get the target containing class indices back, you could use torch.argmax(one_hot, dim=1).


Apart from .scatter_, is there a better way to create a one-hot tensor in the (N,C,H,W) format without transposing?

Which dimension should be used for the one-hot encoding?
.scatter_ should not involve any transposing. Could you post a code snippet which would need it?

I’m sorry I didn’t elaborate on my question better.
To the best of my knowledge, there are two ways of creating one-hot encoded tensors in PyTorch:

  1. .scatter_: with this I can create a one-hot encoded tensor from a given tensor (for me, usually a label map). But for this, I need to define a torch.FloatTensor, and use .scatter_ on it. A torch.FloatTensor needs all its dimensions preemptively defined and does not accept None values. This will usually be an issue when the final mini-batch in a training/validation session is smaller than the other mini-batches. I could alternatively define a new torch.FloatTensor of appropriate shape on each iteration, but I wish to skip this extra operation.
  2. torch.nn.functional.one_hot: with this I can directly yield a one-hot encoded tensor from a given tensor, but the output is in the channels-last format or (N,H,W,C). Hence, torch.transpose is required to convert it into PyTorch’s (N,C,H,W) format. I wish to skip this extra operation as well. (It would be awesome if we have the option in torch.nn.functional.one_hot, to specify which dimension will be the channel dimension.)

Is there a better way? I wish to improve the following rough implementation:

class MyLoss(torch.nn.Module):
    def __init__(self, batch_size, classes):
        super(MyLoss, self).__init__()
        # define some attributes
        self.y_true_one_hot = torch.FloatTensor(batch_size, classes, 240, 240)

    def forward(self, y_pred, y_true):
        with torch.no_grad():
            self.y_true_one_hot.zero_().scatter_(1, y_true, 1)
        # do some operations
        return loss

Thanks for the update.
I would personally stick to the scatter_ approach and either check for a different batch size or just recreate the tensor e.g. via torch.zeros_like. F.one_hot would also recreate the tensor additionally to the permutation, so you wouldn’t avoid this operation.

Regarding the dim argument for one_hot: could you create a feature request on GitHub and explain your use case to start the discussion please?

1 Like

Feature request has been created. Thank you for your time.