Hi PyTorchers,
I’ve been using PyTorch for smaller tasks for a while and want to do a multilabel classification now for the first time. My task is to assign a sentence an arbitrary subset of 11 possible labels/classes. So my output should be a vector with 11 binary entries (0 = class not detected, 1 = class detected).
In order to do so, I have a LSTM that takes the sentence word by word (encoded by word2vec) and feeds its last output to a linear layer which is then returned by the model. So the output of my model is a vector with 11 float values. I am not applying softmax, sigmoid or anything else.
My model looks like this:
class LSTM(nn.Module):
def __init__(self, hidden_dim, tagset_size):
super(LSTM, self).__init__()
self.hidden_dim = hidden_dim
self.layers = 1
self.dropout = 0.0
word2vec = KeyedVectors.load('word2vec.vocab', mmap='r')
self.word2idx = lambda word: word2vec.vocab[word].index if word in word2vec.vocab else 0
self.sent2idx = lambda sent: [self.word2idx(word) for word in sent.split(' ')]
embedding_weights = torch.FloatTensor(np.array(word2vec.wv.syn0))
num_embeddings, embedding_dim = embedding_weights.shape
self.w2v_emb = nn.Embedding.from_pretrained(embedding_weights, freeze=True)
self.bidir = True
self.dirs = 1 + int(self.bidir)
self.clear_hidden(n=chunk_size)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers=self.layers, bidirectional=self.bidir, batch_first=True, dropout=self.dropout)
self.hidden2tag = nn.Linear(self.dirs * hidden_dim, tagset_size)
def clear_hidden(self, n):
self.hidden = (torch.zeros(self.dirs * self.layers, n, self.hidden_dim).to(device),
torch.zeros(self.dirs * self.layers, n, self.hidden_dim).to(device))
def forward(self, sentence):
idxs_unpadded = [torch.tensor(x, dtype=torch.long) for x in list(map(self.sent2idx, sentence))]
lengths = [len(x) for x in idxs_unpadded]
idxs_padded = pad_sequence(idxs_unpadded, batch_first=True, padding_value=0)
idxs = torch.tensor(idxs_padded, dtype=torch.long).to(device)
embeddings = self.w2v_emb(idxs).cpu()
embeddings, lengths, perm = prepare_for_lstm(embeddings, lengths)
embeddings = embeddings.to(device)
lstm_out, self.hidden = self.lstm(embeddings, self.hidden)
results, lengths = unpack_lstm_output(lstm_out)
last_outputs = torch.stack([result[length.data.tolist()-1] for result,length in zip(results, lengths)])
tag_space = self.hidden2tag(last_outputs)
return tag_space, perm
My questions now are:
- Which loss function should I use? I read different opinions on the web: BCEWithLogitsLoss, MultiLabelMarginLoss, CrossEntropyLoss, etc.
- How do I convert the float output to a binary output? I need to find some threshholds, right? Is it possible to find these via end-to-end learning, i.e. the model should learn the thresholds itself and can directly output 11 true/false values?
- Is there a full working example on how to do multilabel classification INCLUDING how to binarize outputs and how to evaluate the model? (Accuracy, jaccard, etc)
- If you see something else that I am doing wrong, please let me know. I am still quite new to PyTorch
Would be very nice if someone could help me a bit
Best,
Simon