Implementation of Seq2Seq models for sentiment classification

akuysal · November 30, 2022, 8:23pm

I’m trying to use Seq2Seq models for sentiment classification. I can execute an implementation using torchtext with FastText model on IMDB dataset. It works as expected and I can obtain predictions. However, when I try to replace FastText model with a BiRNN model , I’m getting an error about input sizes as below. Can anyone explain how to correct the error about input sizes? If you can provide a suitable implementation of a Seq2Seq model that I can adapt, it will be ok for me also. I provide the implementation of these two models as below. Thanks.

raise ValueError(“Target size ({}) must be the same as input size ({})”.format(target.size(), input.size()))
ValueError: Target size (torch.Size([64])) must be the same as input size (torch.Size([1631]))

class FastText(nn.Module):
def init(self,
vocab_size,
embedding_dim,
output_dim,
pad_idx):
super().init()
self.embedding = nn.Embedding(
vocab_size,
embedding_dim,
padding_idx=pad_idx)
self.fc = nn.Linear(embedding_dim,
output_dim)

def forward(self, text):
    embedded = self.embedding(text)
    embedded = embedded.permute(1, 0, 2)
    pooled = F.avg_pool2d(
        embedded,
        (embedded.shape[1], 1)).squeeze(1)
    return self.fc(pooled)

class BiRNN(nn.Module):
def init(self, vocab, embed_size, num_hiddens, num_layers):
super(BiRNN, self).init()
self.embedding = nn.Embedding(len(vocab), embed_size)

    self.encoder = nn.LSTM(input_size=embed_size,
                           hidden_size=num_hiddens,
                           num_layers=num_layers,
                           bidirectional=True)
    self.decoder = nn.Linear(4 * num_hiddens, 1)

def forward(self, inputs):
    embeddings = self.embedding(inputs.permute(1, 0))
    outputs, _ = self.encoder(embeddings)
    encoding = torch.cat((outputs[0], outputs[-1]), -1)
    outs = self.decoder(encoding)  # (batch_size, 2)
    return outs

vdw · December 1, 2022, 1:01am

You can check the docs: The shape of outputs is (L, N, D*H_out) where L is the sequence length, N is the batch size, and D is the number of directions (1 or 2). This means that

(outputs[0], outputs[-1])

does not give exactly what you want. For example, outputs[0] gives you the first hidden state w.r.t. the forward direction and the last hidden state w.r.t. the backward direction; vice verse for outputs[-1].

I would recommend writing

outputs, hidden = self.encoder(embeddings)

and use hidden for further processing. It makes life easier for bidirectional=True, I think. The complete snippet would be

outputs, hidden = self.encoder(embeddings)
# Extract last hidden state (w.r.t. number of layers)
hidden = hidden[0].view(self.num_layers, self.num_directions, batch_size, self.hidden_size)[-1]
# Handle directions (to easily switch between uni and bidirectional)
if self.num_directions == 1:
    hidden = hidden.squeeze(0)
elif self.num_directions == 2:
    h_1, h_2 = hidden[0], hidden[1]    # Get forward and backward direction
    hidden = torch.cat((h_1, h_2), 1)  # Concatenate both states
outs = self.decoder(hidden)

The only thing needed is to adjust the __init__() method a bit; something like

...
# Calculate number of directions
self.num_directions = 2 if self.bidirectional == True else 1
...
# Define last linear layer
self.decoder = nn.Linear(self.num_directions * hidden_size, 1)
...