I’ve been using PyTorch for smaller tasks for a while and want to do a multilabel classification now for the first time. My task is to assign a sentence an arbitrary subset of 11 possible labels/classes. So my output should be a vector with 11 binary entries (0 = class not detected, 1 = class detected).
In order to do so, I have a LSTM that takes the sentence word by word (encoded by word2vec) and feeds its last output to a linear layer which is then returned by the model. So the output of my model is a vector with 11 float values. I am not applying softmax, sigmoid or anything else.
My model looks like this:
class LSTM(nn.Module): def __init__(self, hidden_dim, tagset_size): super(LSTM, self).__init__() self.hidden_dim = hidden_dim self.layers = 1 self.dropout = 0.0 word2vec = KeyedVectors.load('word2vec.vocab', mmap='r') self.word2idx = lambda word: word2vec.vocab[word].index if word in word2vec.vocab else 0 self.sent2idx = lambda sent: [self.word2idx(word) for word in sent.split(' ')] embedding_weights = torch.FloatTensor(np.array(word2vec.wv.syn0)) num_embeddings, embedding_dim = embedding_weights.shape self.w2v_emb = nn.Embedding.from_pretrained(embedding_weights, freeze=True) self.bidir = True self.dirs = 1 + int(self.bidir) self.clear_hidden(n=chunk_size) self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=self.layers, bidirectional=self.bidir, batch_first=True, dropout=self.dropout) self.hidden2tag = nn.Linear(self.dirs * hidden_dim, tagset_size) def clear_hidden(self, n): self.hidden = (torch.zeros(self.dirs * self.layers, n, self.hidden_dim).to(device), torch.zeros(self.dirs * self.layers, n, self.hidden_dim).to(device)) def forward(self, sentence): idxs_unpadded = [torch.tensor(x, dtype=torch.long) for x in list(map(self.sent2idx, sentence))] lengths = [len(x) for x in idxs_unpadded] idxs_padded = pad_sequence(idxs_unpadded, batch_first=True, padding_value=0) idxs = torch.tensor(idxs_padded, dtype=torch.long).to(device) embeddings = self.w2v_emb(idxs).cpu() embeddings, lengths, perm = prepare_for_lstm(embeddings, lengths) embeddings = embeddings.to(device) lstm_out, self.hidden = self.lstm(embeddings, self.hidden) results, lengths = unpack_lstm_output(lstm_out) last_outputs = torch.stack([result[length.data.tolist()-1] for result,length in zip(results, lengths)]) tag_space = self.hidden2tag(last_outputs) return tag_space, perm
My questions now are:
- Which loss function should I use? I read different opinions on the web: BCEWithLogitsLoss, MultiLabelMarginLoss, CrossEntropyLoss, etc.
- How do I convert the float output to a binary output? I need to find some threshholds, right? Is it possible to find these via end-to-end learning, i.e. the model should learn the thresholds itself and can directly output 11 true/false values?
- Is there a full working example on how to do multilabel classification INCLUDING how to binarize outputs and how to evaluate the model? (Accuracy, jaccard, etc)
- If you see something else that I am doing wrong, please let me know. I am still quite new to PyTorch
Would be very nice if someone could help me a bit