Questions about a bidirectional RNN for sentiment analysis

Hi there, I am using a bidirectional vanilla rnn for sentiment analysis. The way I am sending in data to the rnn is like this [batch_size, 1, seq_len] . For example, one batch contains 3 tensors of dimension [1, m] . The m relates to the sentence with the max length i.e most amount of words in that batch. This tensor contains the information of a single sentence. For example lets say I have a sentence like: this is so cool then this sentence becomes something like [2, 5, 11, 21, 1, 1, 1, 1, 1]. So the indices in the tensor correspond to the indices of the words in my vocabulary dictionary. For example the word this has an index of 2 is has an index of 5 so on and so forth. And the 1s represent the index for the ‘pad’ token as every sentence is padded if it is shorter than the longest sentence in the batch. And all the sentences have a label (either 0 for positive or 1 for negative). For example the tensor for the labels for a particular batch might look like this [0, 1, 1].

I am having trouble getting my desired output. Correct me if I am wrong but I should be getting an output that has 6 values right? 3 for the forward pass and 3 for the backward pass since this is a bidirectional rnn and since I have a batch size of 3 and there are 3 labels in each batch. However I am getting the final prediction as a tensor of shape [1, 9, 1].

Here is my code:

data = torch.tensor([[[2, 5, 11, 21, 1, 1, 1, 1, 1]],
                     [[6, 0, 10, 6, 0, 1, 1, 1, 1]],
                     [[9, 15, 16, 4, 0, 17, 0, 10, 18]]], dtype=torch.long)

labels = torch.tensor([0.,0.,1.])

input_seq = data
print(input_seq, input_seq.shape)

batch_size = inpt.shape[0]
seq_len = inpt.shape[-1]

print(batch_size)
print(seq_len)

INPUT_DIM = 22
EMBEDDING_DIM = 5
HIDDEN_DIM = 20
OUTPUT_DIM = 1


embeds = nn.Embedding(INPUT_DIM, EMBEDDING_DIM)
rnn = nn.RNN(EMBEDDING_DIM, HIDDEN_DIM, batch_first=True, bidirectional=True)

inputs = torch.zeros((batch_size, seq_len,  EMBEDDING_DIM))

for i in range(inpt.shape[0]):
    inputs[i] = embeds(inpt[i])


output, hx = rnn(inputs)

print(hx.shape)
print(output.shape)

fc = nn.Linear(HIDDEN_DIM * 2, OUTPUT_DIM)

forward_output = output[:-2, :, :HIDDEN_DIM]
reverse_output = output[2:,:, HIDDEN_DIM:]

staggered_output = torch.cat((forward_output, reverse_output), dim=-1)

predictions = fc(staggered_output)
print(predictions)
print(predictions.shape)

I know that this probably isn’t the best way to do sentiment analysis. But I am new to all of this and am just experimenting to learn more. Any help is appreciated. Thanks :slight_smile:

Hi,
I would suggest you to first look at the documentation on how to create models in pytorch, it uses a class implementation and forward function for forward pass, & how nn.Module is inherited in the Model class.

To improve your model, here are a few suggestions :
Firstly I would suggest you to squeeze in the inner dimension of input and cast it into a shape of (batch_size, seq_len) as you want to use batch_first in your model.

Secondly, call the object of embedding layer over input directly (for loop is not needed as you did).

Since you are using bidirectional = True, it will automatically merge the output of forward direction & backward direction. So you don’t need to explicitly concatenate the outputs.

As you want to learn a sentiment analysis model, then there will be 3 possibilities I guess : neutral, positive & negative i.e. 3 classes. So either use the output of last hidden state (or pool the output of all the hidden states) to get a vector of shape (batch_size, 2*hidden_dim).

Use a linear projection layer to convert the dimension size equal to number of classes (i.e. 3). Use softmax if you are using NLL loss or you can directly use Crossentropy loss.

Hi. Thank you for the response.

So my input is of shape (3, 1, 9) = (batch_size, 1, seq_len), are you saying that I should reshape it to just (3, 9) i.e (batch_size, seq_len)?

Could you explain this part. I didn’t really understand this.

Thanks

Yes, it should be (batch_size, seqlen) i.e. (3,9). Then you can pass it to nn.Embedding() layer.

Sure, so basically when you pass a tensor of shape (batch_size, seqlen, embedding_dim) into a Bidirectional LSTM / RNN, then its shape becomes (batch_size, seqlen, 2*hidden_dim) in the output.
So you can either average all the inner tensors along the dimension of sequence length to get a single vector (i.e. by using AvgPool1D) & then use linear transformation to convert the representation into number of classes.
In place of Average Pooling, you can also use the output generated at last time step of RNN as it stores sequential information of current time step as well as information from previous time steps.

Thank you very much for the response. I took your advice and changed the code accordingly. You can see it below:

(Everything up until the rnn definition is the same)

inputs = embeds(input_seq)
print("Inputs shape:", inputs.shape) --> #(batch_size, seq_len)

output, hx = rnn(inputs)
print("output shape:", output.shape)
print("hx shape: ", hx.shape)

fc = nn.Linear(HIDDEN_DIM * 2, OUTPUT_DIM)

predictions = fc(output)

avg_pool = nn.AvgPool1d(seq_len, stride=seq_len)

predictions = predictions.view(predictions.shape[0], 1, -1)
predictions = avg_pool(predictions)
predictions = predictions.squeeze(1)
predictions = predictions.view(1,-1)
predictions = predictions.squeeze(0)

print("Predictions:", predictions, predictions.shape)
print("Labels: ", labels)

with torch.no_grad():
    criterion = nn.BCEWithLogitsLoss()
    loss = criterion(predictions, labels)
    print('Loss:',loss)

I was able to get my desired output but I wonder if I had to go through all this trouble of squeezing and viewing. Correct me if I am wrong but getting the desired output or computing the loss isnt supposed to be this difficult right? Please let me know if I am doing this correctly. Thanks