Convert int into one-hot format

movefast · December 13, 2017, 6:19am

Run into the issue myself and did some searching, torch.sparse.torch.eye(num_labels).index_select(dim=0, index=labels) also seems to work pretty well in addition to the scatter_ solution in the 0.3 release.

Sajid_Iqbal · December 14, 2017, 7:47pm

def get_one_hot(preds,gt):
encoded_target = preds.data.clone().zero_()
target = gt.unsqueeze(1) # now target is in shape [BCHW]=[20,1,240,240]
unseq = target.long()
unseq = unseq.data

# encoded_target.scatter_(dim,index,val)
# unseq dim 'dim' must be 1
encoded_target.scatter_(1, unseq, 1)
encoded_target=encoded_target.view(-1,5)
#b=encoded_target.view(-1,5,240,240)
#show_my_one_hot(b,target)
return encoded_target

It returns the one hot encoding of the target. In my case the target was of shape [1,1,240,240] and preds of shape [1,5,240,240]

justheuristic · January 8, 2018, 1:45am

Here’s a tensorflow-like solution based on previous code in this thread

def to_one_hot(y, n_dims=None):
    """ Take integer y (tensor or variable) with n dims and convert it to 1-hot representation with n+1 dims. """
    y_tensor = y.data if isinstance(y, Variable) else y
    y_tensor = y_tensor.type(torch.LongTensor).view(-1, 1)
    n_dims = n_dims if n_dims is not None else int(torch.max(y_tensor)) + 1
    y_one_hot = torch.zeros(y_tensor.size()[0], n_dims).scatter_(1, y_tensor, 1)
    y_one_hot = y_one_hot.view(*y.shape, -1)
    return Variable(y_one_hot) if isinstance(y, Variable) else y_one_hot

willyd · April 16, 2018, 1:34pm

It is also possible to abuse broadcasting and do:

# some labels
labels = torch.arange(3)
labels = labels.reshape(3, 1)

num_classes = 4
one_hot_target = (labels == torch.arange(num_classes).reshape(1, num_classes)).float()

gives

 1  0  0  0
 0  1  0  0
 0  0  1  0
[torch.FloatTensor of size (3,4)]

pyaf · June 18, 2018, 10:37am

You can use torch.eye function for this:

def one_hot_embedding(labels, num_classes):
    """Embedding labels to one-hot form.

    Args:
      labels: (LongTensor) class labels, sized [N,].
      num_classes: (int) number of classes.

    Returns:
      (tensor) encoded labels, sized [N, #classes].
    """
    y = torch.eye(num_classes) 
    return y[labels]

This should help!

yuqli · October 23, 2018, 1:18am

As the reply by Adam Paszke on May 17 suggested, this is not a good method since you might need to create a large weight matrix everytime and store it in memory.

ki2rin · November 14, 2018, 1:50am

Wow. This is way simpler than what I used before. I like this one. Thanks!

stephan_shar · December 3, 2018, 6:52am

how can i convert integer values i.e. 0,1,2,3 into the form of one hot vector i.e. [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]?

bhushans23 · December 3, 2018, 8:59am

@apaszke do you think an api e.g. torch.create_one_hot() will be useful and worth providing?
It will take input 1-d tensor and create a 2-d tensor with respective to input encoding

ptrblck · December 3, 2018, 3:47pm

This code should work:

idx = torch.tensor([0, 1, 2, 3])
torch.zeros(len(idx), idx.max()+1).scatter_(1, idx.unsqueeze(1), 1.)

stephan_shar · December 4, 2018, 10:46pm

Thank you, It worked

j-min · January 5, 2019, 7:22am

Finally torch.nn.functional.one_hot has been added.

>>> import torch
>>> x = torch.arange(0,5)
>>> x
tensor([0, 1, 2, 3, 4])
>>> torch.nn.functional.one_hot(x)
tensor([[1, 0, 0, 0, 0],
        [0, 1, 0, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1]])

gsh · March 15, 2019, 9:11am

A batch version that works:
batch_size = 3
y = torch.randint(0, 3, (batch_size, 5)).type(torch.LongTensor)
y_onehot = torch.FloatTensor(batch_size, 5, 5)
y_onehot.zero_()
y_onehot.scatter_(2, torch.unsqueeze(y, 2), 1)
print(y_onehot)

coincheung · March 27, 2019, 6:03am

Thanks, how do you cope with ignore label please ?

Egor_Kraev · May 9, 2019, 2:23pm

I would say don’t bother. The only ways you’ll ever use those one-hot variables is either to embed them (in which case nn.Embedding allows you to do so directly from the indices) or use them in a loss function, in which case why not use a loss function that takes the indices directly.

jon · May 19, 2019, 1:09am

Are you sure about this? I am currently calling it like so:

loss(predictions, target_int_index) and it is complaining.

(predictions is 1xK and target_int_index is a scalar)

f3ba · January 21, 2020, 11:05am

Sorry to revive an old topic. But this is the first result I get on google when searching for one hot encoding in PyTorch. I just want to mention that one can use torch.nn.functional.one_hot (https://pytorch.org/docs/stable/nn.functional.html#one-hot) for this.

f10w · April 26, 2020, 2:45pm

@f3ba torch.nn.functional.one_hot doesn’t allow specifying a dimension though…

ptrblck · April 27, 2020, 1:57am

F.one_hot will accept a varying number of dimensions and will append an additional dimension in the shape nb_classes.
You could permute this tensor afterwards, if necessary.

f10w · April 27, 2020, 8:06pm

@ptrblck You are right, one can transpose the tensor, apply F.one_hot, then transpose back, but then the solution with scatter_ seems to be simpler.