Best way to train on one-hot vectors?


#1

Hey,
I am working on nlp tasks but this question is about data representation.
For my work I am mostly using one-hot vector representation for my features, like postags or things like that. For the moment I naively use basic Tensors as my one-hot vectors.

The training of my modele is very slow, and I am sure their is a way to better represent one-hot vectors.


#2

Which loss function are you using?
Could you post the output, target shapes and code snippet showing your training routine?


#3

I am using BCELoss because my output should be binary.

An input batch is a (6,1001) shaped Tensor. The 1001 values of each line are mostly zeros. The output is then a (6,0) shaped Tensor.


#4

Nobody has good ideas for one-hot encoding in Pytorch ? I heard about an embedding layer at the beginning of the network when using one-hot vectors.


#5

If the one-hot encoded input tensors are representing some indexing, e.g. a word index, you could use an nn.Embedding layer. Have a look at this tutorial for more information.


#6

The one-hot encoded input tensors represent a sequence of pos tags.
One input line is composed by (for my simplest model) Three distance numbers, and 6 pos tags which are encoded as one-hot vectors. It gives me a ~195 tensor which is composed by mostly zeros.

In this condition, do you think it’s a good idea to use a nn.Embedding layer at the beginning of my network ?


#7

Any ideas about a good practice ? Maybe should I encode my pos tags as intergers and have an nn.Embedding layer at the beginning of my network ? Or should I do the same but with one-hot encoded pos tags ?


#8

I’m no expert in NLP, but I would assume using an nn.Embedding layer to get a dense representation of your sparse input data makes sense.
How are your distance numbers encoded? Are they integers as well?
If not, you might want to feed them separately into a linear layer and concatenate them with the embedding tensor.


#9

No, my distance numbers are floats.

The problem is that for a given distance number, I have two corresponding pos tags. A sample is like :
0.2 [0, 0, 1] [0, 0, 1] 0.4 [0, 1, 0] [0, 0, 1] 0.7 [1, 0, 0] [0, 1, 0] 0.5 [0, 0, 1] [1, 0, 0], etc.

If I concatenate the distance numbers with the embedding tensor after, I’ll lose the fact that for example 0.2 is related with [0, 0, 1] [0, 0, 1], no ?


#10

I’m not sure. Would passing both pos tags to the embedding layer and then concatenating it with the distance make sense? Your embedding layer should output dense representations of the concatenated pos tags, so the relationship between them and the distance should still be there, if I’m not mistaken.


#11

How could I pass my pos tags two by two ?
0.2 [0, 0, 1] [0, 0, 1] 0.4 [0, 1, 0] [0, 0, 1] 0.7 [1, 0, 0] [0, 1, 0] 0.5 [0, 0, 1] [1, 0, 0]” is a an example of batchSize = 1.


#12

I’m not sure what “two by two” means.
If you would like to concatenate the pos tags, could you post dummy tensors containing the pos tags and distance for a single sample?


#13

I need my sample to be composed by several [distance postag1 postag2] parts.


#14

I’ve created a small example using an embedding for your pos tags.
I assume your pos tags have valid values between [0, 0, 0] to [1, 1, 1].
Also, I’m treating each combination as one “word”.
Could you check, if this would work for you?

def prepare_sequence(seq, word_to_ix):
    idxs = [word_to_ix[str(w.numpy())] for w in seq]
    return torch.tensor(idxs, dtype=torch.long)


class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.embedding = nn.Embedding(8, 3)
        self.fc = nn.Linear(7, 8)
        
    def forward(self, x_dist, x_pos1, x_pos2):
        x_pos1 = self.embedding(x_pos1)
        x_pos2 = self.embedding(x_pos2)
        x = torch.cat((x_dist, x_pos1, x_pos2), 1)
        x = self.fc(x)
        return x


# Create word_to_ix lookup for [0, 0, 0], [0, 0, 1], ...
word_to_ix = {
    '[{}]'.format(' '.join('{:b}'.format(i).zfill(3))): i
    for i in range(8)
}    

# Create dummy data
nb_samples = 10
x_dist = torch.randn(nb_samples, 1)
x_pos1 = torch.randint(0, 2, (nb_samples, 3))
x_pos2 = torch.randint(0, 2, (nb_samples, 3))

# Prepare sequences ([0, 0, 0] -> 0; [0, 0, 1] -> 1; ...)
x_pos1_idx = prepare_sequence(x_pos1, word_to_ix)
x_pos2_idx = prepare_sequence(x_pos2, word_to_ix)

model = MyModel()
output = model(x_dist, x_pos1_idx, x_pos2_idx)

#15

So let’s suppose my sample is " 0.2 [0, 0, 1] [0, 0, 1] 0.4 [0, 1, 0] [0, 0, 1] 0.7 [1, 0, 0] [0, 1, 0]". I should create a x_pos1 tensor which should be [[0, 0, 1], [0, 1, 0], [1, 0, 0]], a x_pos2 tensor [[0, 0, 1], [0, 0, 1], [0, 1, 0]] and a x_dist tensor [0.2, 0.4, 0.7], and then pass it to the network ?


#16

Yes, that would be my idea. However, I’m not familiar with your data and use case so your model might completely fail to learn something useful using this approach.